in reply to Re^4: Constructing a hash with filepath,filename and filerev
in thread Constructing a hash with filepath,filename and filerev
To some extent (that is, for some portion of what the script has to do), 40,000 lines of list data ought to take about 400 times longer than 100 lines, but the "real-time" end result would depend on things like loop structures, memory consumption, etc. (e.g., other things being equal, a process might run faster if it consumes less memory or generates less output).
Relative to this latest code you posted, I think the only available imrpovements involve minor tweaks that probably won't make a big difference in timing (but might make the output more useful, and that is more important). If you're driven to compare minor speed differences, get acquainted with the Benchmark module.
Here's how I would code the basic process, but I don't know if this version would be measurably faster:
Unlike your code, this version doesn't include the file name in the "path" value (and the nested hash keys are smaller) so it takes a bit less memory -- but this probably makes no difference on 40 K elements.#!/usr/bin/perl use strict; use warnings; @ARGV == 1 and -f $ARGV[0] or die "Usage: $0 list-file-name\n"; my %files; while (<>) { chomp; next unless ( m:^(.+/)(.+?)#(\d+)$: ); my ( $path, $name, $version ) = ( $1, $2, $3 ); push @{ $files{ $name }}, { p => $path, v => $version }; } open( my $glf, ">", "hash_glf.txt" ) or die "hash_glf.txt: $!\n"; for my $f ( keys %files ) { print $glf join( "\n ", $f, map { "$$_{p}\t$$_{v}" } @{$files{$f +}} ), "\n\n"; }
Also, while the output format used by Data::Dumper is very reasonable and readable, I think it's just as well to go with a more compact format in this case, with one indented line of "path/ version" for each element of the input list, organized into "paragraphs" by file name. (And not using Data::Dumper for output might save some time.)
|
|---|