Looks like you have a pretty good naming convention. One simplification would be to just a split instead of a regex on the path name. Another simplification is to just one single dimensional hash.

Below for each line, I extract out the version number with a simple split. The rest becomes the path name used in the hash. The version numbers are represented as a anon array attached to those keys. I don't need a HoH because, I can get the name of the file easily with the basename($file) function.

For output, I sorted the keys of %versions by the filename. A file that appears in more than one directory will cause a duplicate line in the output. Should be straightforward to detect ($current_filename eq $last_filename) and then decide what you want to do about that situation within the loop.

I am also puzzled about the speed comment. A 40K line file is not considered "big" nowadays. The Perl sort is very good and keeps getting better. I would guess that the sort will take less than a second. So at most this is just going to take 1-2 seconds. If there is some super duper speed requirement such that say even a couple of seconds is not sufficient, then it would help if you explained that requirement in more detail. And, yes there are all kinds of ways to make this code somewhat faster, but I doubt that you will need to. Test with your data set and report back.

#!/usr/bin/perl -w use strict; use File::Basename; use Data::Dumper; my %versions; while (<DATA>) { chomp(); next if /^\s*$/; #skip blank lines my ($path, $version) = split('#', $_); push (@{$versions{$path}}, $version); } foreach my $path (sort { my $afile = basename($a); #sort by file name my $bfile = basename($b); $afile cmp $bfile } keys %versions ) { my $file = basename($path); printf "$file \tversions: @{$versions{$path}}\t$path\n"; } =prints apps.c versions: 14 18 //depot/asic/tools/perl/proc/examples/apps +.c file.txt versions: 2 //depot/asic/tools/perl/files/examples/file.tx +t modem.c versions: 4 5 6 7 //depot/asic/tools/perl/scripts/exampl +es/modem.c modem.c versions: 6 //depot/asic/tools/perl/proc/examples/modem.c =cut __DATA__ //depot/asic/tools/perl/scripts/examples/modem.c#4 //depot/asic/tools/perl/scripts/examples/modem.c#5 //depot/asic/tools/perl/scripts/examples/modem.c#6 //depot/asic/tools/perl/scripts/examples/modem.c#7 //depot/asic/tools/perl/files/examples/file.txt#2 //depot/asic/tools/perl/proc/examples/apps.c#14 //depot/asic/tools/perl/proc/examples/apps.c#18 //depot/asic/tools/perl/proc/examples/modem.c#6

In reply to Re: Constructing a hash with filepath,filename and filerev by Marshall
in thread Constructing a hash with filepath,filename and filerev by perl_mystery

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.