coldfingertips has asked for the wisdom of the Perl Monks concerning the following question:

I have a hash that contains the data below
index.html => Mon Oct 11 00:08:11 2004|12963 screenshot.jpg => Sun Oct 10 13:18:30 2004|234997 legal.html => Mon Oct 11 12:57:03 2004|13448 stylesheet.css => Mon Oct 11 13:57:28 2004|697 about.html => Mon Oct 11 00:08:08 2004|13225 archive.html => Mon Oct 11 00:08:09 2004|12872 postinfo.html => Fri Oct 1 23:49:15 2004|2457 contact.shtml => Mon Oct 11 00:09:48 2004|11366 services.html => Mon Oct 11 00:08:17 2004|14256 metatags.pl => Mon Oct 11 14:05:44 2004|28668 tools.html => Mon Oct 11 15:35:47 2004|14632 robots.txt => Sat Oct 9 03:35:15 2004|73 _vti_inf.html => Fri Oct 1 23:49:15 2004|1754 report.shtml => Mon Oct 11 00:07:03 2004|11686
As you can see, it goes KEY => value1|value2. How can I sort the hash by the first split value? You can see that it's a date, so I really need the newest date on top.

Any pointers?

Replies are listed 'Best First'.
Re: sorting a split hash
by Velaki (Chaplain) on Oct 12, 2004 at 04:03 UTC

    This is a perfect candidate for an ST sort!

    #!/usr/bin/perl use strict; use warnings; use HTTP::Date; # needed for str2time my %hash; while(<DATA>) { my ($key,$value) = /(\S+) => (.*)$/; $hash{$key} = $value; } my @sorted_keys = map $_->[0] => sort { $a->[1] cmp $b->[1] } map [ $_, # This is where we extract the value1 # -- [0] in the array returned from # the split --, and convert it to # a time value to be used in the sort. str2time((split(/\|/,$hash{$_}))[0]) ] => keys %hash; for my $key (@sorted_keys) { print "$key => $hash{$key}\n"; } __DATA__ index.html => Mon Oct 11 00:08:11 2004|12963 screenshot.jpg => Sun Oct 10 13:18:30 2004|234997 legal.html => Mon Oct 11 12:57:03 2004|13448 stylesheet.css => Mon Oct 11 13:57:28 2004|697 about.html => Mon Oct 11 00:08:08 2004|13225 archive.html => Mon Oct 11 00:08:09 2004|12872 postinfo.html => Fri Oct 1 23:49:15 2004|2457 contact.shtml => Mon Oct 11 00:09:48 2004|11366 services.html => Mon Oct 11 00:08:17 2004|14256 metatags.pl => Mon Oct 11 14:05:44 2004|28668 tools.html => Mon Oct 11 15:35:47 2004|14632 robots.txt => Sat Oct 9 03:35:15 2004|73 _vti_inf.html => Fri Oct 1 23:49:15 2004|1754 report.shtml => Mon Oct 11 00:07:03 2004|11686

    Hope this helped,
    -v
    "Perl. There is no substitute."
      As long as we have the nit combs out, and while coldfingers forgot to mention it, you might want to add a specific ordering for the keys if their values happen to have the same timestamp
      sort { $b->[1] <=> $a->[1] || $a->[0] cmp $b->[0] }
      coldfingers will have to decide whether they want to swap the $a and $b, now that sulfericacid has mentioned it matters. Oh, and I really think that should be a <=> first as the value returned from str2time (thank you for pointing it out!) seems to be a number. Don't want any problems with dates before September 2001.
      Wouldn't you have to reverse sort the array? I tried this and it actually prints in reverse chronological order (oldest on top, newest on bottom)


      "Age is nothing more than an inaccurate number bestowed upon us at birth as just another means for others to judge and classify us"

      sulfericacid
Re: sorting a split hash
by davido (Cardinal) on Oct 12, 2004 at 04:47 UTC

    Here's how I might do it:

    use strict; use warnings; use Date::Manip; my %files; while ( my $line = <DATA> ) { chomp $line; my( $key, $value ) = split /\s*=>\s*/, $line; $files{ $key } = $value; } my @sorted_keys = map { $_->[0] } sort { Date_Cmp( $a->[1], $b->[1] ) } map { [ $_, ParseDate( ( split /\|/, $files{$_} )[0] ) ] } keys %files; print "$_ => $files{$_}\n" foreach @sorted_keys; __DATA__ index.html => Mon Oct 11 00:08:11 2004|12963 screenshot.jpg => Sun Oct 10 13:18:30 2004|234997 legal.html => Mon Oct 11 12:57:03 2004|13448 stylesheet.css => Mon Oct 11 13:57:28 2004|697 about.html => Mon Oct 11 00:08:08 2004|13225 archive.html => Mon Oct 11 00:08:09 2004|12872 postinfo.html => Fri Oct 1 23:49:15 2004|2457 contact.shtml => Mon Oct 11 00:09:48 2004|11366 services.html => Mon Oct 11 00:08:17 2004|14256 metatags.pl => Mon Oct 11 14:05:44 2004|28668 tools.html => Mon Oct 11 15:35:47 2004|14632 robots.txt => Sat Oct 9 03:35:15 2004|73 _vti_inf.html => Fri Oct 1 23:49:15 2004|1754 report.shtml => Mon Oct 11 00:07:03 2004|11686

    If the dataset is small, you may not care to bother with the Schwartzian Transform that my solution uses... its performance gain might not be worth the extra effort. For larger datasets, it can prove beneficial though. If you've already split the date / size info into anonymous arrays held in the hash, there'll be no need for the transform at all, except perhaps to parse the date with ParseDate()

    I chose to use Date::Manip to transform your dates into something that can be easily compared (and sorted). I also used that module's Date_Cmp() function to perform the comparison within the sort routine. I could have used the cmp operator instead, but the Date_Cmp() function is also timezone friendly.


    Dave

Re: sorting a split hash
by NetWallah (Canon) on Oct 12, 2004 at 04:05 UTC
    Here is a snippet to get you started on analyzing the Date:
    use Date::Parse; my ($x,$y)=split /\|/, 'Mon Oct 11 00:08:11 2004|12963'; print qq(x=$x\n); my $t=str2time($x); print qq(T=$t\n); # OUPUT ############## # x=Mon Oct 11 00:08:11 2004 # T=1097478491 ##in perl/unix time() format
    For the sorting, please look at

    How do I sort a hash by its values?

        Earth first! (We'll rob the other planets later)

Re: sorting a split hash
by pg (Canon) on Oct 12, 2004 at 03:40 UTC

    One solution is to also keep a second hash, a hash of array refs. For this hash, keys are the dates, and values are array refs of file names. You need an array because multiple files may share the same date. Sort the key of this second hash.

    If the sole purpose of the first hash is to print this report, then forget about your first hash, and just use the hash I suggested here.

    Also create the second hash, when you are creating the first one (by doing this, most likely you don't need to split '|').

      Whoa there. I've never used hash refs or array refs before so I have zero clue what you were even saying.

      Any hints at where to go from here?

        This gives you the hash I talked about (just sort of one liner):

        use Data::Dumper; use strict; use warnings; my %h1 = ( "index.html" => "Mon Oct 11 00:08:11 2004|12963", "screenshot.jpg" => "Sun Oct 10 13:18:30 2004|234997", "legal.html" => "Mon Oct 11 12:57:03 2004|13448", "stylesheet.css" => "Mon Oct 11 13:57:28 2004|697", "about.html" => "Mon Oct 11 00:08:08 2004|13225", "archive.html" => "Mon Oct 11 00:08:09 2004|12872", "postinfo.html" => "Fri Oct 1 23:49:15 2004|2457", "contact.shtml" => "Mon Oct 11 00:09:48 2004|11366", "services.html" => "Mon Oct 11 00:08:17 2004|14256", "metatags.pl" => "Mon Oct 11 14:05:44 2004|28668", "tools.html" => "Mon Oct 11 15:35:47 2004|14632", "robots.txt" => "Sat Oct 9 03:35:15 2004|73", "_vti_inf.html" => "Fri Oct 1 23:49:15 2004|1754", "report.shtml" => "Mon Oct 11 00:07:03 2004|11686" ); my $h2; push @{$h2->{(split(/\|/, $h1{$_}))[0]}}, $_ for (keys(%h1)); print Dumper($h2);

        Output:

        $VAR1 = { 'Sat Oct 9 03:35:15 2004' => [ 'robots.txt' ], 'Mon Oct 11 14:05:44 2004' => [ 'metatags.pl' ], 'Mon Oct 11 00:09:48 2004' => [ 'contact.shtml' ], 'Mon Oct 11 00:08:08 2004' => [ 'about.html' ], 'Mon Oct 11 12:57:03 2004' => [ 'legal.html' ], 'Mon Oct 11 15:35:47 2004' => [ 'tools.html' ], 'Fri Oct 1 23:49:15 2004' => [ '_vti_inf.html', 'postinfo.html' ], 'Mon Oct 11 00:08:09 2004' => [ 'archive.html' ], 'Mon Oct 11 00:08:17 2004' => [ 'services.html' ], 'Mon Oct 11 00:07:03 2004' => [ 'report.shtml' ], 'Mon Oct 11 00:08:11 2004' => [ 'index.html' ], 'Sun Oct 10 13:18:30 2004' => [ 'screenshot.jpg' ], 'Mon Oct 11 13:57:28 2004' => [ 'stylesheet.css' ] };
Re: sorting a split hash
by TedPride (Priest) on Oct 12, 2004 at 09:40 UTC
    The only hard part is converting the date to a format that can be easily searched. This can be done with a date conversion module (see above) or directly with a regex or split:
    use strict; my %data; foreach (<DATA>) { split(/ => /, $_); $data{@_[0]} = @_[1]; } ####### my %months = ('Jan' => '01', 'Feb' => '02', 'Mar' => '03', 'Apr' => '04', 'May' => '05', 'Jun' => '06', 'Jul' => '07', 'Aug' => '08', 'Sep' => '09', 'Oct' => '10', 'Nov' => '11', 'Dec' => '12'); foreach (keys %data) { $data{$_} =~ /\w+ (\w+) (\d+) (\d+):(\d+):(\d+) (\d+)/; $data{$_} = [$6.$months{$1}.sprintf('%02d', $2).$3.$4.$5, $data{$_ +}]; } foreach (sort {$data{$b}->[0] <=> $data{$a}->[0]} keys %data) { print $_ . ' => ' . $data{$_}->[1]; } ####### __DATA__ index.html => Mon Oct 11 00:08:11 2004|12963 screenshot.jpg => Sun Oct 10 13:18:30 2004|234997 legal.html => Mon Oct 11 12:57:03 2004|13448 stylesheet.css => Mon Oct 11 13:57:28 2004|697 about.html => Mon Oct 11 00:08:08 2004|13225 archive.html => Mon Oct 11 00:08:09 2004|12872 postinfo.html => Fri Oct 1 23:49:15 2004|2457 contact.shtml => Mon Oct 11 00:09:48 2004|11366 services.html => Mon Oct 11 00:08:17 2004|14256 metatags.pl => Mon Oct 11 14:05:44 2004|28668 tools.html => Mon Oct 11 15:35:47 2004|14632 robots.txt => Sat Oct 9 03:35:15 2004|73 _vti_inf.html => Fri Oct 1 23:49:15 2004|1754 report.shtml => Mon Oct 11 00:07:03 2004|11686