Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Good evening,

I need some advice from the monks.

I have several hundred lines of 5 unique items on each line. What is the best way to store them (i.e. an array or hash) and to sort on several of each of the columns in each line? Should I use a hash that refernces an array with 5 elements or is there a more efficient way of doing it?

Thanks

Replies are listed 'Best First'.
Re: Data Structure advice
by shemp (Deacon) on Nov 23, 2002 at 00:01 UTC
    I think this is a completely general way to do this. It will need some cleanup to be used in a real app though, because of the globals. (i was struggling with getting bizarre_sort() to take parameters properly. :(
    #!/usr/bin/perl -w use strict; my @order_array; my %types = ( 0 => 'alpha', 1 => 'alpha', 2 => 'numeric', ); { my @data = ( ["sean", "peters", 29], ["matt", "horner", 23], ["dorthy", "radley", 49], ["tom", "curtis", 52], ["sean", "jones", 38], ); print "By first name:\n"; sorter([0], @data); print "\nBy last name:\n"; sorter([1], @data); print "\nby age:\n"; sorter([2], @data); print "\nby first then last:\n"; sorter([0,1], @data); print "\nby first then age:\n"; sorter([0,2], @data); } sub sorter { my ($order, @data) = @_; @order_array = @$order; foreach my $person (sort bizarre_sort @data ) { for my $i (0 .. 2) { print $person->[$i] . " "; } print "\n"; } } sub bizarre_sort { foreach my $field_num (@order_array) { my $result = low_checker($field_num, $types{$field_num}, $a, $ +b); return $result if $result; } # in case there was equality on all keys return 1; } sub low_checker { my ($field_num, $type, $a, $b) = @_; if ( $type eq "alpha" ) { if ( $a->[$field_num] cmp $b->[$field_num] ) { return ($a->[$field_num] cmp $b->[$field_num]); } } elsif ( $type eq "numeric" ) { if ( $a->[$field_num] <=> $b->[$field_num] ) { return ($a->[$field_num] <=> $b->[$field_num]); } } else { confess(); } }
    longwinded, someone can probably make this cleaner.
Re: Data Structure advice
by waswas-fng (Curate) on Nov 22, 2002 at 23:04 UTC
    Do you mean:
    col1 col2 col3 col4 col5 c k b e d h i j a l
    where the sorted outcome is:  b,c,d,e,k  a,h,i,j,l

    -Waswas
      Columns 1 and 3 are of concern, I first want to consolodate column 3, and then sort again by column 1 BEFORE SORT: col1 col2 col3 col4 col5 12 z d r 4 13 z c r 4 14 z d r 4 15 z c r 4 AFTER SORT: col1 col2 col3 col4 col5 12 z d r 4 14 z d r 4 13 z c r 4 15 z c r 4
Re: Data Structure advice
by shemp (Deacon) on Nov 22, 2002 at 23:09 UTC
    One question regarding your post is what were you thinking of for the hash key?
    In any case, here is a proposal for a solution to your problem. I'll leave initializing the structure up to you. I propose just storing the data in an array of arrayrefs. Then, sorting could go something like this:
    ... # assume the @data is your array of arrayrefs. # get the data sorted on field 2 (zero based) @data = numeric_sorter(2, @data) sub numeric_sorter { my ($field_num, @data) = @_; my @sorted_items = sort {$a->[$field_num] <=> $b->[$field_num]} @d +ata; return @sorted_items; }
    If you wanted an alphabetic sorter, replace "<=>" with "cmp"
    I chose to use an array of arrayrefs for memory efficiency, but it could be an array of hashrefs if you wanted to reference the columns by name instead of field number:
    @data = numeric_sorter(2, @data) sub alpha_sorter { my ($field_name, @data) = @_; my @sorted_items = sort {$a->{$field_name} <=> $b->{$field_name}} +@data; return @sorted_items; }
      I think we both misread his question -- here is what I missed: Unique items are only line per line, and he wants to sort on one coulumn and then if a "tie" happens break it with second sort on another coulmn.

      I am trying to think of a good way to do this.
      Edited this should work as long as both are the same type:
      @data = numeric_sorter(2,4, @data) sub numeric_sorter { my ($field_num, $field_num_secondary, @data) = @_; my @sorted_items = sort { if ($a->[$field_num] != $b->[$field_num]) { $a->[$field_num] <=> $b->[$field_num] } else { $a->[$field_num_secondary] <=> $b->[$field_num_secondary] }} @data; return @sorted_items; }


      -Waswas
        using me above array of arrayrefs implementation, if you want to sort on column 3 and break ties with column 1, do this:
        @data = sort { $a->[3] <=> $b->[1] || $a->[1] <=> $b->[1] } @data;
        that can be extended indefinitely with or "||" clauses, but to completely generalize this is much harder.
        actually the first column is a timestamp, and the third column is a hex number
Re: Data Structure advice
by chromatic (Archbishop) on Nov 22, 2002 at 23:09 UTC

    It depends. It depends on the nature of your data and on what you need to do with the data. It's hard to answer this question.

Re: Data Structure advice
by gjb (Vicar) on Nov 23, 2002 at 21:16 UTC

    If you're familiar with SQL, it might help to transform the data to CSV format and use DBD::CSV to sort (or group) using a SELECT statement.

    Hope this helps, -gjb-