in reply to Re^2: Sorting an array of hashes and filenames
in thread Sorting an array of hashes and filenames

Being as you're fairly new here, I'm not going to rag on you too much, but you need to understand something -- this site welcomes questions about Perl, but it's important to be able to explain

In your case, we have a fragment of Perl that doesn't compile. This isn't helpful to us, so our answers may not be very useful to you. A complete working example of what isn't working is way more use to us.

Anyway, here's a self-contained script that works and (I think) does what you want it to:

#!/usr/bin/perl -w use strict; my @data = ( "3343df3ffdkj34j3k34j3k testfile1", "389k34d46hj3k493843kjj testfile2", "lj3l4o342u423see3u43u4 testfile3", # Copied the first line to show that duplicates work. "3343df3ffdkj34j3k34j3k testfile4", ); { my %result; # Loop through the array of lines. foreach my $line (@data) { # Split the array element into hash value and filename. my ( $hashValue, $filename ) = split( /\s/, $line ); # Store the filename, indexed by hash value, into an array. This # allows us to store multiple files with the same hash value. push( @{ $result{$hashValue} }, $filename ); } # Dump out the result hash, sorting by the hash values. foreach my $key ( sort keys %result ) { # Dump out the array of filenames indexed by this hash value. We # could have sorted this list too if we wanted. foreach my $filename ( @{ $result{$key} } ) { print "$key -> $filename\n"; } } }
When this is run, it produces
tab@foobar:~dev$ perl -w 736240.pl 3343df3ffdkj34j3k34j3k -> testfile1 3343df3ffdkj34j3k34j3k -> testfile4 389k34d46hj3k493843kjj -> testfile2 lj3l4o342u423see3u43u4 -> testfile3
I'm not sure if that's what you were looking for, but that's my solution for my best guess at what you're looking for.

Now, please note the copious (and probably unnecessary) comments and the logical variable names in my script. Your original code had an array called 'hashes'. That's only confusing initially, but it's not a great name, I would have used digests or something like that. But the biggest problem with your code is that there were no comments.

Once more, with feeling:

Alex / talexb / Toronto

"Groklaw is the open-source mentality applied to legal research" ~ Linus Torvalds

Replies are listed 'Best First'.
Re^4: Sorting an array of hashes and filenames
by learningperl01 (Beadle) on Jan 15, 2009 at 16:32 UTC
    Thanks for the tip and code. Here is the code that I have that is still not sorting the data correctly. I would like to sort based on the first value as shown under expected results.
    #!/usr/bin/perl use strict; use File::Find; use File::stat; use Digest::MD5; find(\&search, "/home/test/"); sub search { my %result; # Make sure we only look at files and end in src extn. if ( -f && /\.src$/i ) { foreach my $files ( $File::Find::name ) { open(FH, $files) or die "Can't open '$files': $!"; binmode(FH); # Get/Create MD5 for the file my $hashValue = Digest::MD5->new->addfile(*FH)->hexdigest; # Store the filename, indexed by hash value, into an arra +y. This # allows us to store multiple files with the same hash va +lue. push( @{ $result{$hashValue} }, $files ); } # Dump out the result hash, sorting by the hash values. foreach my $key ( sort keys %result ) { # Dump out the array of filenames indexed by this hash va +lue. We # could have sorted this list too if we wanted. foreach my $filename ( @{ $result{$key} } ) { print "$key -> $filename\n"; } } } } ------------------------------------------------------ _____Current results______ b78c8fafb9a5d4df6b36dcd35c56f6aa -> /home/test/file1.src f49f62c0d885ea4687d3149ea95c98fd -> /home/test/test2.src 6ebd09e0c2ff94316410b1444fbbb37a -> /home/test/file4.src b1a087078b62487c1d4c02f4c943af09 -> /home/test/file10.src ______Expected results________ 6ebd09e0c2ff94316410b1444fbbb37a -> /home/test/file4.src b1a087078b62487c1d4c02f4c943af09 -> /home/test/file10.src b78c8fafb9a5d4df6b36dcd35c56f6aa -> /home/test/file1.src f49f62c0d885ea4687d3149ea95c98fd -> /home/test/test2.src

      I think you may have misunderstood how File::Find works -- when you call the find method, it's going to call your (mis-named) search function for each file that it finds.

      I would rename that function to storeFileHashValue to better reflect what it's actually doing. Then I'd move the declaration of %result into the file scope, and dump out the results of your file scanning after you call find.

      So your code would look something like this:

      #!/usr/bin/perl use strict; use File::Find; use File::stat; use Digest::MD5; my %result; find(\&storeHashvalue, "/home/test/"); # Dump out the result hash, sorting by the hash values. foreach my $key ( sort keys %result ) { # Dump out the array of filenames indexed by this hash value. We # could have sorted this list too if we wanted. foreach my $filename ( @{ $result{$key} } ) { print "$key -> $filename\n"; } } sub storeHashvalue { # Make sure we only look at files and end in src extn. if ( -f && /\.src$/i ) { foreach my $files ( $File::Find::name ) { open(FH, $files) or die "Can't open '$files': $!"; binmode(FH); # Get/Create MD5 for the file my $hashValue = Digest::MD5->new->addfile(*FH)->hexdigest; # Store the filename, indexed by hash value, into an arra +y. This # allows us to store multiple files with the same hash va +lue. push( @{ $result{$hashValue} }, $files ); } } }
      I haven't tried this. Last night I was at home. Right now I'm on the job.

      Alex / talexb / Toronto

      "Groklaw is the open-source mentality applied to legal research" ~ Linus Torvalds

        Thank you so much, I don't think I would have ever figured this one out. I knew it was something to do with the File find but didn't realize what it was. Your code worked as expected. Thanks again for all the help I really appreciate it.