in reply to Re^4: Constructing a hash with filepath,filename and filerev
in thread Constructing a hash with filepath,filename and filerev

How long is "a lot of time"? How short is "immediately"? If you put your input list file name as a command-line arg (i.e. into @ARGV), rather than prompting for for it via keyboard input, you'll be able to get a better measure of run time (e.g. using the "time" utility, which comes standard on unix/linux/macosx).

To some extent (that is, for some portion of what the script has to do), 40,000 lines of list data ought to take about 400 times longer than 100 lines, but the "real-time" end result would depend on things like loop structures, memory consumption, etc. (e.g., other things being equal, a process might run faster if it consumes less memory or generates less output).

Relative to this latest code you posted, I think the only available imrpovements involve minor tweaks that probably won't make a big difference in timing (but might make the output more useful, and that is more important). If you're driven to compare minor speed differences, get acquainted with the Benchmark module.

Here's how I would code the basic process, but I don't know if this version would be measurably faster:

#!/usr/bin/perl use strict; use warnings; @ARGV == 1 and -f $ARGV[0] or die "Usage: $0 list-file-name\n"; my %files; while (<>) { chomp; next unless ( m:^(.+/)(.+?)#(\d+)$: ); my ( $path, $name, $version ) = ( $1, $2, $3 ); push @{ $files{ $name }}, { p => $path, v => $version }; } open( my $glf, ">", "hash_glf.txt" ) or die "hash_glf.txt: $!\n"; for my $f ( keys %files ) { print $glf join( "\n ", $f, map { "$$_{p}\t$$_{v}" } @{$files{$f +}} ), "\n\n"; }
Unlike your code, this version doesn't include the file name in the "path" value (and the nested hash keys are smaller) so it takes a bit less memory -- but this probably makes no difference on 40 K elements.

Also, while the output format used by Data::Dumper is very reasonable and readable, I think it's just as well to go with a more compact format in this case, with one indented line of "path/ version" for each element of the input list, organized into "paragraphs" by file name. (And not using Data::Dumper for output might save some time.)