Here's suhailck's basic example using a hash of arrays of hashes to allow
for multiple versions of the same file with the same name.
#!perl
use strict;
use warnings;
my %files_by;
while (<DATA>) {
my ($file_path, $file_name, $version)
= m{^(.*/(.*))#(\d+)$};
push @{ $files_by{$file_name} }, {
file_path => $file_path,
version => $version,
};
}
for my $file_name (sort keys %files_by) {
for my $href (@{ $files_by{$file_name} }) {
print "File name: $file_name\n";
print "File path: $href->{file_path}\n";
print "File version: $href->{version}\n";
}
}
# This prints 6
print $files_by{'modem.c'}[2]{version}, "\n";
# This prints '//depot/asic/tools/perl/files/examples/apps.c'
print $files_by{'apps.c'}[0]{file_path}, "\n";
__DATA__
//depot/asic/tools/perl/scripts/examples/modem.c#4
//depot/asic/tools/perl/scripts/examples/modem.c#5
//depot/asic/tools/perl/scripts/examples/modem.c#6
//depot/asic/tools/perl/scripts/examples/modem.c#7
//depot/asic/tools/perl/files/examples/file.txt#2
//depot/asic/tools/perl/proc/examples/apps.c#14
| [reply] [d/l] |
I am trying to output the hash to a file "hash_glf.txt",using the following code,the file given as input is 40,000 lines.It takes a lot of time to outpuf to "hash_glf.txt",if I change the input file to some 100 lines,I can see the hash structure being output'ed to hash_glf.txt immediately.Is there a way I can quicken the script to output to the file quickly?
#!/usr/bin/perl -w
use strict;
use warnings;
use Data::Dumper;
my %files_by;
print "Enter File1 ";
my $file1_name = <>;
chomp($file1_name);
open my $DATA, '<', $file1_name or die "Cannot open file 1\n";
while (<$DATA>) {
my ($file_path, $file_name, $version)
= m{^(.*/(.*))#(\d+)$};
push @{ $files_by{$file_name} }, {
file_path => $file_path,
version => $version,
};
}
open my $hash_glf, '>', "hash_glf.txt";
print $hash_glf Dumper( \%files_by );
close $DATA;
| [reply] [d/l] |
How long is "a lot of time"? How short is "immediately"? If you put your input list file name as a command-line arg (i.e. into @ARGV), rather than prompting for for it via keyboard input, you'll be able to get a better measure of run time (e.g. using the "time" utility, which comes standard on unix/linux/macosx).
To some extent (that is, for some portion of what the script has to do), 40,000 lines of list data ought to take about 400 times longer than 100 lines, but the "real-time" end result would depend on things like loop structures, memory consumption, etc. (e.g., other things being equal, a process might run faster if it consumes less memory or generates less output).
Relative to this latest code you posted, I think the only available imrpovements involve minor tweaks that probably won't make a big difference in timing (but might make the output more useful, and that is more important). If you're driven to compare minor speed differences, get acquainted with the Benchmark module.
Here's how I would code the basic process, but I don't know if this version would be measurably faster:
#!/usr/bin/perl
use strict;
use warnings;
@ARGV == 1 and -f $ARGV[0] or die "Usage: $0 list-file-name\n";
my %files;
while (<>) {
chomp;
next unless ( m:^(.+/)(.+?)#(\d+)$: );
my ( $path, $name, $version ) = ( $1, $2, $3 );
push @{ $files{ $name }}, { p => $path, v => $version };
}
open( my $glf, ">", "hash_glf.txt" ) or die "hash_glf.txt: $!\n";
for my $f ( keys %files ) {
print $glf join( "\n ", $f, map { "$$_{p}\t$$_{v}" } @{$files{$f
+}} ), "\n\n";
}
Unlike your code, this version doesn't include the file name in the "path" value (and the nested hash keys are smaller) so it takes a bit less memory -- but this probably makes no difference on 40 K elements.
Also, while the output format used by Data::Dumper is very reasonable and readable, I think it's just as well to go with a more compact format in this case, with one indented line of "path/ version" for each element of the input list, organized into "paragraphs" by file name. (And not using Data::Dumper for output might save some time.) | [reply] [d/l] |