ihperlbeg has asked for the wisdom of the Perl Monks concerning the following question:

I have the following problem

DATA xhahhxha 60 3 hahaghagah 10 1 101 xhahhxha 60 3 jrthtahtat 8 1 110 xhahhxha 60 3 shdgehsh 8 1 150 hsghtahs 100 19 hahaghagah 10 20 200 hsghtahs 100 19 jrthtahtat 10 20 300 hsghtahs 100 19 shdgehsh 10 20 400 I want to sort this data such that it outputs a unique string from fir +st column (sorted based on second column value, lowest value string f +irst and its best possible match based on the lowest value in the fif +th column and highest value in seventh column (if value in the fifth +column is equal). so the output should look like this for the above e +xample: xhahhxha 60 3 shdgehsh 8 1 150 hsghtahs 100 19 shdgehsh 10 20 400
any help? suggestions? Thanks!

Replies are listed 'Best First'.
Re: sorting a table columns using hashes
by Marshall (Canon) on May 05, 2011 at 15:56 UTC
    I am also a bit fuzzy on the requirements as the data doesn't appear to exercise all of the limits. But it appears that the data can be sorted in some order and then a filter (grep) operation run to select which lines are of relevance. See attached code.

    If I don't have it exactly right, I think you will be able to modify this pattern to do what you want. I'm not quite sure about the relationship between col 1 and col 2. The example data had only "matching sets" of those two columns. matching up the lowest col 5 with highest col 7 is accomplished by reversing the sort order of col 7 as shown below.

    #!/usr/bin/perl -w use strict; use Data::Dump qw(pp); my @AoA; while(<DATA>) { my @cols = split; push @AoA, [@cols]; } @AoA = sort{ $a->[1] <=> $b->[1] or $a->[0] cmp $b->[0] or $a->[4] <=> $b->[4] or $b->[6] <=> $a->[6] }@AoA; pp \@AoA; =prints [ ["xhahhxha", 60, 3, "shdgehsh", 8, 1, 150], #this one ["xhahhxha", 60, 3, "jrthtahtat", 8, 1, 110], ["xhahhxha", 60, 3, "hahaghagah", 10, 1, 101], ["hsghtahs", 100, 19, "shdgehsh", 10, 20, 400], #this one ["hsghtahs", 100, 19, "jrthtahtat", 10, 20, 300], ["hsghtahs", 100, 19, "hahaghagah", 10, 20, 200], ] =cut my %seen; @AoA = grep{!$seen{"$_->[1]"."$_->[0]"}++}@AoA; #first of new col1,2 c +ombo pp \@AoA; =prints [ ["xhahhxha", 60, 3, "shdgehsh", 8, 1, 150], ["hsghtahs", 100, 19, "shdgehsh", 10, 20, 400], ] =cut __DATA__ xhahhxha 60 3 hahaghagah 10 1 101 xhahhxha 60 3 jrthtahtat 8 1 110 xhahhxha 60 3 shdgehsh 8 1 150 hsghtahs 100 19 hahaghagah 10 20 200 hsghtahs 100 19 jrthtahtat 10 20 300 hsghtahs 100 19 shdgehsh 10 20 400
      I like the way you are weaving the test output as POD into your code.

      I wonder if there is already a CPAN module allowing to automate this, since $. __LINE__ and caller give the current line of source code.

      Might be handy for PM posts

      Cheers Rolf

      PS: talking about fuzzy requirements, why does the second columns have precedence over the first in your sort?

      UPDATE: corrected $. (which is only INPUT_LINE_NUMBER)

      UPDATE: see weaving output into code for a proof of concept

        Glad you like my little POD trick. I don't know of any automated modules for this - interesting idea!

        I'm still unsure about the col 1 and col 2 stuff. I guess I keyed in on this phrase "(sorted based on second column..." and thought that was the primary key. My brain had a bit of trouble interpreting the spec. The test data doesn't have enough cases to unambiguously demonstrate the desired behavior. I hope the code is clear enough that the OP can make these precedence tweaks or other desired changes.

        cheers, Marshall

Re: sorting a table columns using hashes
by LanX (Saint) on May 05, 2011 at 15:25 UTC
    > suggestions?

    sure: the perldocs for while, split, sort

    Searching for "Hash of Array" and "Array of Array " and "Schwartzian transform" might help.

    Showing us your attempts instead of just a fuzzy requirement hidden in the code-area will help giving you more constructive advices.

    Cheers Rolf

    UPDATE: If I understand your data, you should:

  • parse your data into a hash (first column=key) of arrays of arrays (splitted lines).
  • Sort the arrays of arrays combining different weighted criteria with or (search sort examples for ||)
  • output of top-entry for every key of hash.
Re: sorting a table columns using hashes
by Utilitarian (Vicar) on May 05, 2011 at 15:39 UTC
    Does lowest value in column 5 override highest value in column 7 or versa vice?
    ie with the data
    xhahhxha 60 3 hahaghagah 7 1 101 xhahhxha 60 3 jrthtahtat 8 1 110 xhahhxha 60 3 shdgehsh 10 1 150
    What do you output?

    print "Good ",qw(night morning afternoon evening)[(localtime)[2]/6]," fellow monks."
      with this data: xhahhxha 60 3 hahaghagah 7 1 101 xhahhxha 60 3 jrthtahtat 8 1 110 xhahhxha 60 3 shdgehsh 10 1 150 Output will be: xhahhxha 60 3 hahaghagah 7 1 101
        #!/usr/bin/perl while (<DATA>){ chomp; @record=split(/\s+/,$_); if (defined $records{$record[0]}){ # We've seen it before and need + to compare the data if($record[1] < $records{$record[0]}->[1]){ # we have a smalle +r value and so should use this $records{$record[0]}->[1] = $record[1]; } if ( ($record[4] < $records{$record[0]}->[4]) ){ $records{$record[0]}->[3] = $record[3]; $records{$record[0]}->[4] = $record[4]; $records{$record[0]}->[6] = $record[6]; }elsif ( ( $record[4] == $records{$record[0]}->[4]) && ($recor +d[6] > $records{$record[0]}->[6]) ){ $records{$record[0]}->[3] = $record[3]; $records{$record[0]}->[6] = $record[6]; } } else{ @{$records{$record[0]}}=@record; } } for $key (reverse sort keys %records){ print join ("\t", @{$records{$key}}),"\n"; } __DATA__ xhahhxha 60 3 hahaghagah 10 1 101 xhahhxha 60 3 jrthtahtat 8 1 110 xhahhxha 60 3 shdgehsh 8 1 150 hsghtahs 100 19 hahaghagah 10 20 200 hsghtahs 100 19 jrthtahtat 10 20 300 hsghtahs 100 19 shdgehsh 10 20 400 __END__ xhahhxha 60 3 shdgehsh 8 1 150 hsghtahs 100 19 shdgehsh 10 20 400
        Ugly but functional

        print "Good ",qw(night morning afternoon evening)[(localtime)[2]/6]," fellow monks."