perl_mystery has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to construct a hash ,a input file which contains the following lines, the hash should contain the filename as key and the filepath and the filerev as values,I wrote the below sample code but am not sure if it works.Can someone look at the code ?

INPUT:-File containing following lines //depot/asic/tools/perl/scripts/examples/modem.c#7

//depot/asic/tools/perl/files/examples/file.txt#2

//depot/asic/tools/perl/proc/examples/apps.c#14 ......

OUTPUT:-Hash should be constructed as follows hash{file_path}=//depot/asic/tools/perl/scripts/examples/modem.c hash{file_name)=modem.c hash{file_rev)=7

my %hash = (); print "Enter File "; my $file_name = <>; chomp($file_name); open my $DATA, '<', $file_name or die "Cannot open file 2\n"; while (my $line = <$DATA>){ hash{file_path} = $line =~ /\/\/.*[#]/xms; #match "//" and them ma +tch everything until pound(#) hash{file_name) = $line =~ /[\/]+?(\w+)\#/xms; #Get the filename b +etween last frontslash "/" and pound(#) hash{file_rev) = $line =~ /[#](\d+)/xms;#Match the number after po +und (#) sign }

Replies are listed 'Best First'.
Re: Constructing a hash with filepath,filename and filerev
by suhailck (Friar) on Dec 11, 2010 at 05:51 UTC
    You can code the following way provided the file name is unique.Other wise use an ArrayofHashes.
    Read perldsc for information about perl datastructures.
    use strict; use warnings; use Data::Dumper; my %hash; while(<DATA>) { my ($file_path,$file_name,$ver)=m{^(.*/(.*))\#(\d+)$}; %{$hash{$file_name}}=(file_path => $file_path, version => $ver); } print Dumper(\%hash); __DATA__ //depot/asic/tools/perl/scripts/examples/modem.c#7 //depot/asic/tools/perl/files/examples/file.txt#2 //depot/asic/tools/perl/proc/examples/apps.c#14

      The filename is not unique,just curious what is the problem with the above code if the filename is not unique and how using array of hashes will resolve the problem?

        Here's suhailck's basic example using a hash of arrays of hashes to allow for multiple versions of the same file with the same name.

        #!perl use strict; use warnings; my %files_by; while (<DATA>) { my ($file_path, $file_name, $version) = m{^(.*/(.*))#(\d+)$}; push @{ $files_by{$file_name} }, { file_path => $file_path, version => $version, }; } for my $file_name (sort keys %files_by) { for my $href (@{ $files_by{$file_name} }) { print "File name: $file_name\n"; print "File path: $href->{file_path}\n"; print "File version: $href->{version}\n"; } } # This prints 6 print $files_by{'modem.c'}[2]{version}, "\n"; # This prints '//depot/asic/tools/perl/files/examples/apps.c' print $files_by{'apps.c'}[0]{file_path}, "\n"; __DATA__ //depot/asic/tools/perl/scripts/examples/modem.c#4 //depot/asic/tools/perl/scripts/examples/modem.c#5 //depot/asic/tools/perl/scripts/examples/modem.c#6 //depot/asic/tools/perl/scripts/examples/modem.c#7 //depot/asic/tools/perl/files/examples/file.txt#2 //depot/asic/tools/perl/proc/examples/apps.c#14
Re: Constructing a hash with filepath,filename and filerev
by Anonymous Monk on Dec 11, 2010 at 05:48 UTC
    You're not sure if your code works? Have you tried running it?
Re: Constructing a hash with filepath,filename and filerev
by Marshall (Canon) on Dec 12, 2010 at 11:02 UTC
    Looks like you have a pretty good naming convention. One simplification would be to just a split instead of a regex on the path name. Another simplification is to just one single dimensional hash.

    Below for each line, I extract out the version number with a simple split. The rest becomes the path name used in the hash. The version numbers are represented as a anon array attached to those keys. I don't need a HoH because, I can get the name of the file easily with the basename($file) function.

    For output, I sorted the keys of %versions by the filename. A file that appears in more than one directory will cause a duplicate line in the output. Should be straightforward to detect ($current_filename eq $last_filename) and then decide what you want to do about that situation within the loop.

    I am also puzzled about the speed comment. A 40K line file is not considered "big" nowadays. The Perl sort is very good and keeps getting better. I would guess that the sort will take less than a second. So at most this is just going to take 1-2 seconds. If there is some super duper speed requirement such that say even a couple of seconds is not sufficient, then it would help if you explained that requirement in more detail. And, yes there are all kinds of ways to make this code somewhat faster, but I doubt that you will need to. Test with your data set and report back.

    #!/usr/bin/perl -w use strict; use File::Basename; use Data::Dumper; my %versions; while (<DATA>) { chomp(); next if /^\s*$/; #skip blank lines my ($path, $version) = split('#', $_); push (@{$versions{$path}}, $version); } foreach my $path (sort { my $afile = basename($a); #sort by file name my $bfile = basename($b); $afile cmp $bfile } keys %versions ) { my $file = basename($path); printf "$file \tversions: @{$versions{$path}}\t$path\n"; } =prints apps.c versions: 14 18 //depot/asic/tools/perl/proc/examples/apps +.c file.txt versions: 2 //depot/asic/tools/perl/files/examples/file.tx +t modem.c versions: 4 5 6 7 //depot/asic/tools/perl/scripts/exampl +es/modem.c modem.c versions: 6 //depot/asic/tools/perl/proc/examples/modem.c =cut __DATA__ //depot/asic/tools/perl/scripts/examples/modem.c#4 //depot/asic/tools/perl/scripts/examples/modem.c#5 //depot/asic/tools/perl/scripts/examples/modem.c#6 //depot/asic/tools/perl/scripts/examples/modem.c#7 //depot/asic/tools/perl/files/examples/file.txt#2 //depot/asic/tools/perl/proc/examples/apps.c#14 //depot/asic/tools/perl/proc/examples/apps.c#18 //depot/asic/tools/perl/proc/examples/modem.c#6