learningperl01 has asked for the wisdom of the Perl Monks concerning the following question:

Hello all,

hoping someone can point me in the right direction. I am kinda new to sorting...Here is what I am trying to do with no luck.

I have hashes and files names which are both stored in an array (@hashes), what I am trying to do is sort by just the hashes then print both the hash and filename sorted, don't care if its asc or desc really.

I have also tried sorting using the #commented out sort, that also did not work.

This is what I have but it's not sorting.

open(FILE, $_) or die "Can't open '$_': $!"; binmode(FILE); my @hashes=(Digest::MD5->new->addfile(*FILE)->hexdigest,$f +ilename); my @sorted = sort {@{$a}[0] cmp @{$b}[0]} @hashes; #my @sorted = sort { lc($a) cmp lc($b) } @hashes; print "$sorted[0] $sorted[1]\n"; } close(FILE); } }

example @array contents
3343df3ffdkj34j3k34j3k testfile1
389k34d46hj3k493843kjj testfile2
lj3l4o342u423see3u43u4 testfile3

Replies are listed 'Best First'.
Re: Sorting an array of hashes and filenames
by DStaal (Chaplain) on Jan 14, 2009 at 15:15 UTC

    The general reason you are having problems is that you are trying to sort both the hashes and the filenames, as if they were equivalent items. They aren't: They are related.

    This calls for a Perl hash, which keeps related things together.

    I've been trying to adapt one into your code for five minutes or so, and have come to the conclusion that either I don't understand line 3, or there is something completely screwy with your code snippit.

    Therefore, pseudo-code to do what you want:

    my %file_hashes; foreach my $file ( #List_of_files ) { my $hash = hash_fuc($file); $file_hashes{$hash} = $file; } foreach my $hash ( sort keys %file_hashes ) { print "$hash $file_hashes{$hash}\n"; }

    You'll probably want to read up on hashes in Perl. They are useful.

    (Edit: Removed brainfart at the end.)

      (By the way: You could make the key the filename, depending on what else you were doing in the code. Just use 'values' instead of 'keys' above.)

      Not really. You can get the value from the key, but not the other way around. To sort by value you need something like this:
      foreach my $hash ( sort {$file_hashes{$a} cmp $file_hashes{$b}} keys %file_hashes ) { print "$hash $file_hashes{$hash}\n"; }

        Good point. I must have twisted my brain by the time I got around to writing that.

      Thanks for all the replies. Here is the code that I have but still prints out the hashes unsorted? Not sure why?
      foreach my $file ( $fns ) { open(FH, $file) or die "Can't open '$file': $!"; binmode(FH); my $hash = Digest::MD5->new->addfile(*FH)->hexdigest; $file_hashes{$hash} = $file; } foreach my $hash ( sort {$file_hashes{$a} cmp $file_hashes{$b} +} keys %file_hashes ) { print "$hash $file_hashes{$hash}\n"; } } ------------Results--------------- 32e3d09e0c2ff94316410b1444fbbb37a file1.txt 123d087078b62487c1d4c02f4c943af09 file2.txt 3ddbadc770e1c25a91aa186d3b0595945 file3.txt a3ff6417e3b703604c400965330ea6612 file4.txt b78c8fafdb9a5d4df6b36dcd35c56f6aa file5.txt

        That sorted it perfectly: Exactly as you told it to.

        By the hash values, not the keys. The values of the hash (as you've constructed it) are the filenames, and if you'll notice, those are precisely in order.

        Take out the {$file_hashes{$a} cmp $file_hashes{$b}}. It isn't what you want. (If you really want to put something there, put in {$a cmp $b}, but that's the default...

Re: Sorting an array of hashes and filenames
by talexb (Chancellor) on Jan 14, 2009 at 15:13 UTC

    It looks like what you're trying to do is compare files based on their MD5s -- I do that once in a while, and I just use a Linux command line for that:

    find $dir -type f | grep -v svn | xargs md5sum | sort

    The part about ignoring 'svn' files is so that the Subversion files don't get included. And if you've got ack installed, even better:

    ack -f $dir | xargs md5sum | sort
    since it ignores Subversion files automatically.

    There's no point in building a Perl script to do something that shell's much better at doing, unless this is part of a larger program. But that's just my guess, based on the incomplete information available to me. :)

    Alex / talexb / Toronto

    "Groklaw is the open-source mentality applied to legal research" ~ Linus Torvalds

      actually I was trying to sort on the first column only (the hash values) not the file names. How do I just sort on the first column, but still print the hash and the file name but sorted based on the hashes. thanks for all the help once again

        Being as you're fairly new here, I'm not going to rag on you too much, but you need to understand something -- this site welcomes questions about Perl, but it's important to be able to explain

        • What you're trying to do
        • What result you're expecting
        • What result you're actually getting
        • What code you're using to get that result -- a complete script would be ideal.
        In your case, we have a fragment of Perl that doesn't compile. This isn't helpful to us, so our answers may not be very useful to you. A complete working example of what isn't working is way more use to us.

        Now, please note the copious (and probably unnecessary) comments and the logical variable names in my script. Your original code had an array called 'hashes'. That's only confusing initially, but it's not a great name, I would have used digests or something like that. But the biggest problem with your code is that there were no comments.

        Once more, with feeling:

        • What you're trying to do (sort an array of hashes, except it's not an AoH, but an array of MD5 digests. Except that they also have filenames)
        • What result you're expecting (a list, sorted by hash value)
        • What result you're actually getting (we aren't told, except that it's not sorted)
        • What code you're using to get that result. (We got a code fragment that doesn't compile even when we clean up the braces, and the note that @array continues some data looks useful until you observe that this variable never appears in the code fragment.)

        Alex / talexb / Toronto

        "Groklaw is the open-source mentality applied to legal research" ~ Linus Torvalds

Re: Sorting an array of hashes and filenames
by Gangabass (Vicar) on Jan 14, 2009 at 15:14 UTC

    I'm think you need to change your code something like that:

    my @hashes; foreach my $filename (@filenames) { open my $fh, "<", $filename or die "Can't open $filename: $!"; binmode $fh; push @hashes, [ Digest::MD5->new->addfile($fh)->hexdigest, $filename + ]; close $fh; } my @sorted = sort { $a->[0] cmp $b->[0] } @hashes; print "$sorted[0] $sorted[1]\n";

    Update: Fixed error with array vs. arrayref (thanks to oshalla).