Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

How can you sort an array by a splitted value? For example, I has a hash that looks something like

my %hash( "bill nye" => "39||Somehere in Cali||Cali||12345", "homer simpson" => "36||Springfield||unknown||23456", "barney rubble" => "31||Bedrock||cartoon location||3456"; );
The hash key doesn't matter, but I split many different bits of information on || delimiters. Pretend the hash value is age||city||state||zip. How can I sort the hash by any of those delimiters? I can sort $key, but that doesn't do the job. I need to be able to print the sorted hash by ANY bit of information. (like I would want to sort the hash by City or Zip Code instead of alphabetically). Any help would be very much appreciated.

Replies are listed 'Best First'.
Re: Complex hash sorting
by Zaxo (Archbishop) on Feb 01, 2004 at 03:09 UTC

    You might as well make the hash values refer to an array of the data,

    # $hash{$_} = [ split '||', $hash{$_} ] for keys %hash; $hash{$_} = [ split /\|\|/, $hash{$_} ] for keys %hash;
    You can now sort the keys any way you like by
    my @by_zip = sort {$hash($a}[-1] <=> $hash{$b}[-1] } keys %hash; my @by_age = sort {$hash($a}[0] <=> $hash{$b}[0] } keys %hash; my @by_surname = sort { substr($a, rindex($a,' ')) cmp substr($b, rindex($b,' ')) } keys %hash;
    Notice the difference in comparison operator between numeric and dictionary data.

    Update: japhy++ is correct. Repaired without obscuring the damage.

    After Compline,
    Zaxo

      Your '||' regex needs to be '\|\|'. And, as far as my preference goes, I'd do: $_ = [ split '\|\|' ] for values %hash;
      _____________________________________________________
      Jeff[japhy]Pinyan: Perl, regex, and perl hacker, who'd like a job (NYC-area)
      s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;
Re: Complex hash sorting
by stvn (Monsignor) on Feb 01, 2004 at 03:12 UTC

    To start off, I would really recommend a RDMS of some kind, there are alot fo very nice Open Source alternatives out there such as MySQL. It would make this whole thing alot easier.

    But if you must, then this is how you want to approach this.

    # sort by zip-code my @sorted_by_zip_code = sort { (split /\|\|/ => $hash{$a})[3] <=> (split /\|\|/ => $hash{$b})[3] } keys %hash; # sort by city alphabetically my @sorted_by_city = sort { lc((split /\|\|/ => $hash{$a})[1]) cmp lc((split /\|\|/ => $hash{$ +b})[1]) } keys %hash;
    Just changing the index of the split array to whatever item you want to sort by. Be sure to use the appropriate operator (<=> for numbers and cmp for letters), and if you want true alphabetical sorting you will need to do it case-insensitive so you should lowercase the values you are sorting.

    Your results will be an array of keys sorted in the order you desire. You can then use them to extract your values and do whatever you need from there.

    -stvn
Re: Complex hash sorting
by allolex (Curate) on Feb 01, 2004 at 03:30 UTC
    How can I sort the hash by any of those delimiters? I can sort $key, but that doesn't do the job. I need to be able to print the sorted hash by ANY bit of information.

    Then a hash is probably not the data structure you want. Try an array of arrays instead. You know what is supposed to be stored in each field, right?

    So the idea is that you put everything into a data structure like this (using the split function):

    #!/usr/bin/perl use strict; use warnings; # an array: 0=name, 1=number, 2=city, 3=state, 4=zip my @customers = ( [ 'nye, bill','39','Somehere in Cali','Colombia','12345' ], [ 'simpson, homer','36','Springfield', 'OR', '23456' ], [ 'rubble, barney','31','Bedrock','cartoon location','33456' ] ); # set to the array field you want my $field = '0'; my @sorted = sort { $a->[$field] cmp $b->[$field] } @customers; map { print $_->[$field] . "\n" } @sorted; # OUTPUT nye, bill rubble, barney simpson, homer

    Of course you can tweak this to print out what ever you want. This would be best as a subroutine to which you pass the sort field number, IMO.

    --
    Allolex

      Also, OP, consider using constants so that you would not have to think about array order more than once. And create a hash that would store mapping from type of elements to perl comparison operators...

      #!/usr/local/bin/perl use warnings; use strict; use constant { NAME => [ 0 , 'cmp' ] , NUMBER => [ 1 , '<=>' ] , CITY => [ 2 , 'cmp' ] , STATE => [ 3 , 'cmp' ] , ZIP => [ 4 , '<=>' ] # Assuming zip is a number } ; my @customers = ( [ 'nye, bill','39','Somehere in Cali','Colombia','12345' ] , [ 'simpson, homer','36','Springfield', 'OR', '23456' ] , [ 'rubble, barney','31','Bedrock','cartoon location','33456' ] ); # Get user input for sort order my @criteria = (NAME , NUMBER , ZIP , STATE); my $sort = make_sort( \@criteria ); my @sorted = sort $sort @customers; #my @control = # sort # { $a->[NAME ->[0]] cmp $b->[NAME ->[0]] # || $a->[NUMBER->[0]] <=> $b->[NUMBER->[0]] # || $a->[ZIP ->[0]] <=> $b->[ZIP ->[0]] # || $a->[STATE ->[0]] cmp $b->[STATE ->[0]] # } @customers; use Data::Dumper; $Data::Dumper::Deepcopy = 1; print Dumper( \@sorted #, \@control ) ; sub make_sort { my @how = @{ $_[0] }; my $sort = join ' || ' , map join( $_->[1] , q/ $a->[/ . $_->[0] . ']' , q/ $b->[/ . $_->[0] . ']' ) , @how ; return eval "sub { $sort ; }"; }

      Above can be retrofitted for other similar structures.

      Update, Feb 1, 2004: Above is another take of Fun with complex sorting... by Dave the davido.

Re: Complex hash sorting
by davido (Cardinal) on Feb 01, 2004 at 06:18 UTC
    This isn't really as hard as it seems. I'll present two versions that return a list of hash keys in sorted order, sorted by whichever of the sub-values you wish. The first method is going to be a straight sort. If efficiency is an issue, it would probably be best to turn this one into a Schwartzian Transform. But that's not absolutely necessary.

    Keep in mind that as others have mentioned, turning your hash values into array-refs is probably a better solution, but I'm going to just work with the assumption that you really do want to keep that '||' delimited list as the value to each hash element.

    First, the straight sort:

    my $criteria = 3; my @sorted = sort { (split /\|\|/, $hash{$a} )[$criteria] cmp (split /\|\|/, $hash{$b} )[$criteria] } keys %hash +;

    The method listed above indexes into the split list, using the specified sub-element as the sort criteria.

    This next method will allow any arbitrary number of sort criteria, in the order you choose.

    my @criteria = (0, 2); my @sorted = map { $_->[0] } sort { $a->[1] cmp $b->[1] } map { [ $_, join( "", ( split /\|\|/, $hash{$_} )[@sorted +] ] } keys %hash;

    This second method uses a Schwartzian Transform in conjunction with join to create a single sort criteria comprised of the slice created by taking (possibly multiple) indexes into the split results.

    The shortcoming here is that we're doing all string comparison, which sort of wreks havok on strings that have numeric components. But the advantage here is that you can actually specify multiple sort criteria in one search easily.

    There are an infinate number of other ways to do this. Here are some thoughts:

    You might use eval and build your sort routine dymanically (once) so that it can be of arbitrary complexity, chained together with the logical short-circuit 'or' operator. This will allow the user to specify that he/she needs to sort by the 0th, 2nd, and 1st criteria today, but the 1st and 4th later. Also, by building it up dynamically, you can at runtime decide whether to do a 'cmp' or an '<=>' comparison. I give a simple example of this technique in Fun with complex sorting on arbitrary criteria list.. My example implementation has a few shortcomings too, but most could be overcome without too much extra work.


    Dave

      The problem with some of the other solutions is, this isn't a static hash I'm trying to sort. This is a tied hash to an SDBM database, so just using an array isn't an option as the program has been up and running with data stored for quite some time already.

      Your first method seems to be easier to understand/follow, though I've never seen things done like this before...so I'll try going with that. Can you explain what/why $criteria is and why it's 3? And after it's in an array, how would you split everything back into separate variables for printing?

      This is an example of my actual coding

      tie %db, "DB_File", "$db", O_CREAT | O_RDWR, 0644, $DB_BTREE or die "Cannot open file 'db': $!\n"; print "<table>"; foreach (keys %db) { my ($count, $ip, $time) = split(/\|\|/, $db{$_}); print "<td>$count</td><td>$ip</td><td>$time</td>"; } </table>
      There are four values (including the key itself), I want to make three of these sortable: the key, $count and $time). My method of doing this is calling a url_param defining the choice of sort. Ie. script.cgi?sort_by=count or script.cgi?sort_by=key and depending on what they chose, a different hash sort method will be used.

      Thanks for your time.

        Just an observation, but if tied hashes are slow, and sorting hashes based on complex operations are slow, and regexes are kind of slow... this code must really break down with a large number of records.

        This seems to be a classic case where taking the time to load the records into a (SQL) database and then query the database would produce cleaner code with (potentially) a speed savings of (who knows) 10x? Use temp tables or temp views if you must.

Re: Complex hash sorting
by Limbic~Region (Chancellor) on Feb 01, 2004 at 03:45 UTC
    Anonymous Monk,
    Being able to sort the hash as you have it could be done with the Schwartzian Transform. The trouble is that you may want to change your mind at any point - or perhaps you want to add new keys and values without concerning yourself with the sort order. Take a look at the following: Cheers - L~R
Re: Complex hash sorting
by sulfericacid (Deacon) on Feb 01, 2004 at 03:07 UTC
    This is just an idea, so don't take my word for it..you may want to wait to see what other monks have to say. I've stumbed across this problem a few times (never had the initiative to find a solution though).

    You COULD try sorting by the KEY of the hash (for the first value of age), then rebuild your hash so the value you are wanting to split on is first. An example, if you wanted to search for their zip code, just SPLIT your hash and JOIN it again with the zip code being the first thing before your delimiter: value = 12345||39||Somehere in Cali||Cali.

    This is untested and just an idea, but hope it helps.



    "Age is nothing more than an inaccurate number bestowed upon us at birth as just another means for others to judge and classify us"

    sulfericacid
      sulfericacid,
      I tried something similar once. I reordered the data and then just sorted the string.

      However, the problem is that you have to sort ASCII which means that 999 is greater than 1000. The next thing I did was to sprintf my data so that I was comparing 0999 to 1000. That worked. That didn't really sit well though.

      It seems to be a lot of mess. Then I discovered LoLs and HoLs and other complex data types. They make it much easier so I'd go with any of the methods below.

      Personally I'd store the data as a LoH, but that's just me.

      my @data = ( { name => 'Bill Nye', age => 39, town => 'Somewhere in Cali', state => 'Cali', zip => '12345' }, { name => 'Homer Simpson', age => 36, town => 'Springfield', state => undef, zip => '23456' }, { name => 'Barney Rubble', age => 31, town => 'Bedrock', state => 'Cartoon Location', zip => '3456' }, ); print Dumper( sort { $a->{name} cmp $b->{name} } @data );
      Other monks will probably tell you that the above is not a good thing because you're storing your hash keys over and over again. However I think it makes it much easier to track your data. It would, of course, be possible to create your own data structure parsing module that had objects and methods for creating, editing and deleting data but there's a ton already out there.

Re: Complex hash sorting
by Roger (Parson) on Feb 01, 2004 at 13:18 UTC
    You could also use the Sort::Fields module...
    use strict; use warnings; use Sort::Fields; use Data::Dumper; my %hash = ( "bill nye" => "39||Somehere in Cali||Cali||12345", "homer simpson" => "36||Springfield||unknown||23456", "barney rubble" => "31||Bedrock||cartoon location||3456", ); my @sorted = map { /^([^\|]+)/ } fieldsort '||', [1, -1], map { $_ . '||' . $hash{$_} } keys %hash; print Dumper(\@sorted);

    And the output -
    $VAR1 = [ 'barney rubble', 'bill nye', 'homer simpson' ];

    Update: just realized that solution will not work... doing soming debugging at the moment.