Sandy_Bio_Perl has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks. I have a number of arrays containing 9 character elements (peptides). Here are a few of those arrays

@array1 =(TTVVRRRDR KFRQLLWFH RSQSPRRRR CSPHHTALR WIRTPPAYR VVRRRDRGR +HISCLTFGR RTPSPRRRR NTNMGLKFR LVSFGVWIR); @array2 = (AILCWGELM TVLEYLVSF GLKFRQLLW LSFLPSDFF LLSFLPSDF LLWFHISCL + MQLFHLCLI QLFHLCLII ATVELLSFL ALRQAILCW); @array3 = (TTVVRRRDR CSPHHTALR WIRTPPAYR NVNMGLKIR LVSFGVWIR HISCLTFGR +); @array4 = (CSPHHTALR YVNTNMGLK WIRTPPAYR TLPETTVVR HISCLTFGR NTNMGLKIR + WGMDIDPYK LVSFGVWIR);

I would like to sort the elements of these arrays according to the frequency that each peptides occurs. I am storing that frequency in a hash, which when I print part of it, looks like this

MQLFHLCLI = 36 FLPSDFFPS = 32 YLVSFGVWI = 32 LLWFHISCL = 31 QLFHLCLII = 28 IISCSCPTV = 23 ELMNLATWV = 18 HISCLTFGR = 16 ATVELLSFL = 15

E.g. if the peptide 'MQLFHLCLI' appears in an array, it should be shown as the first element

I would be very grateful if any the brethren have a solution to this. Many thanks in advance.

Replies are listed 'Best First'.
Re: Sort array using a ranking system from separate hash
by LanX (Saint) on Aug 07, 2016 at 23:53 UTC
    Hallo Sandy,

    What did you try? Isn't it straight forward after reading sort (sic) ?

    like modifying this snippet from the perldoc accordingly?

    # this sorts the %age hash by value instead of key # using an in-line function my @eldest = sort { $age{$b} <=> $age{$a} } keys %age;

    Or am I missing something?

    Cheers Rolf
    (addicted to the Perl Programming Language and ☆☆☆☆ :)
    Je suis Charlie!

    update

    PS: Sorry for not providing ready to use code, I prefer to teach fishing instead of giving away fishes.

      Thank you Rolf. I am looking to order the elements of, say, a single array according to a list determined held separately. So, if my original array was ordered

      @array = (ATVELLSFL, QLFHLCLII, YLVSFGVWI, MQLFHLCLI);

      and my order preference is

      MQLFHLCLI FLPSDFFPS YLVSFGVWI LLWFHISCL QLFHLCLII IISCSCPTV ELMNLATWV HISCLTFGR ATVELLSFL

      then my array should be re-ordered

      @orderedArray = (MQLFHLCLI, YLVSFGVWI, QLFHLCLII, ATVELLSFL);

      The length of the array to ordered will always be shorter than the my order preference list, because this list is used with a large number of different arrays

      Apologies for not explaining my question properly in the first place

        Hi Sandy_Bio_Perl,

        Apologies for not explaining my question properly in the first place

        I recommend you read How do I post a question effectively? and Short, Self Contained, Correct Example. Showing your own coding efforts not only gives people trying to help a basis to start from, it also shows that you're not just trying to abuse PerlMonks as a code writing service. Not only that, boiling down your code into a SSCCE and trying to explain the problem will often help you yourself to figure out the issue at hand.

        Anyway, BrowserUk and Marshall already answered your question and provided code; the only difference is that instead of their %freq resp. %histo hashes being populated from the data, you simply need to pre-populate it.

        Hope this helps,
        -- Hauke D

        Isn't it straight forward after reading sort ?

        like modifying this snippet from the perldoc accordingly?

        # this sorts the %age hash by value instead of key # using an in-line function my @eldest = sort { $age{$b} <=> $age{$a} } keys %age;

        Cheers Rolf
        (addicted to the Perl Programming Language and ☆☆☆☆ :)
        Je suis Charlie!

Re: Sort array using a ranking system from separate hash
by BrowserUk (Patriarch) on Aug 08, 2016 at 04:33 UTC

    Assuming you have your "number of arrays" stored as an array of arrays:

    #! perl -slw use strict; use Data::Dump qw[ pp ]; my @AoAs = ( [ qw[ TTVVRRRDR KFRQLLWFH RSQSPRRRR CSPHHTALR WIRTPPAYR VVRRRDRGR +HISCLTFGR RTPSPRRRR NTNMGLKFR LVSFGVWIR ] ], [ qw[ AILCWGELM TVLEYLVSF GLKFRQLLW LSFLPSDFF LLSFLPSDF LLWFHISCL +MQLFHLCLI QLFHLCLII ATVELLSFL ALRQAILCW ] ], [ qw[ TTVVRRRDR CSPHHTALR WIRTPPAYR NVNMGLKIR LVSFGVWIR HISCLTFGR +] ], [ qw[ CSPHHTALR YVNTNMGLK WIRTPPAYR TLPETTVVR HISCLTFGR NTNMGLKIR +WGMDIDPYK LVSFGVWIR ] ], ); my %freq; map ++$freq{ $_ }, @$_ for @AoAs; @$_ = sort{ $freq{ $b } <=> $freq{ $a } } @$_ for @AoAs; pp \@AoAs __END__ C:\test>1169316 [ ["CSPHHTALR", "WIRTPPAYR", "HISCLTFGR", "LVSFGVWIR", "TTVVRRRDR", "K +FRQLLWFH", "RSQSPRRRR", "VVRRRDRGR", "RTPSPRRRR", "NTNMGLKFR"], ["AILCWGELM", "TVLEYLVSF", "GLKFRQLLW", "LSFLPSDFF", "LLSFLPSDF", "L +LWFHISCL", "MQLFHLCLI", "QLFHLCLII", "ATVELLSFL", "ALRQAILCW"], ["CSPHHTALR", "WIRTPPAYR", "LVSFGVWIR", "HISCLTFGR", "TTVVRRRDR", "N +VNMGLKIR"], ["CSPHHTALR", "WIRTPPAYR", "HISCLTFGR", "LVSFGVWIR", "YVNTNMGLK", "T +LPETTVVR", "NTNMGLKIR", "WGMDIDPYK"], ]

    Update: you might want to include a tie-break for those with an equal frequency to ensure consistency. Eg. Here I've used increasing aplha ordering as the tie break which makes the output more consistent:

    #! perl -slw use strict; use Data::Dump qw[ pp ]; $Data::Dump::WIDTH = 200; my @AoAs = ( [ qw[ TTVVRRRDR KFRQLLWFH RSQSPRRRR CSPHHTALR WIRTPPAYR VVRRRDRGR +HISCLTFGR RTPSPRRRR NTNMGLKFR LVSFGVWIR ] ], [ qw[ AILCWGELM TVLEYLVSF GLKFRQLLW LSFLPSDFF LLSFLPSDF LLWFHISCL +MQLFHLCLI QLFHLCLII ATVELLSFL ALRQAILCW ] ], [ qw[ TTVVRRRDR CSPHHTALR WIRTPPAYR NVNMGLKIR LVSFGVWIR HISCLTFGR +] ], [ qw[ CSPHHTALR YVNTNMGLK WIRTPPAYR TLPETTVVR HISCLTFGR NTNMGLKIR +WGMDIDPYK LVSFGVWIR ] ], ); my %freq; map ++$freq{ $_ }, @$_ for @AoAs; @$_ = sort{ $freq{ $b } <=> $freq{ $a } || $a cmp $b } @$_ for @AoAs; pp \@AoAs __END__ C:\test>1169316 [ ["CSPHHTALR", "HISCLTFGR", "LVSFGVWIR", "WIRTPPAYR", "TTVVRRRDR", "K +FRQLLWFH", "NTNMGLKFR", "RSQSPRRRR", "RTPSPRRRR", "VVRRRDRGR"], ["AILCWGELM", "ALRQAILCW", "ATVELLSFL", "GLKFRQLLW", "LLSFLPSDF", "L +LWFHISCL", "LSFLPSDFF", "MQLFHLCLI", "QLFHLCLII", "TVLEYLVSF"], ["CSPHHTALR", "HISCLTFGR", "LVSFGVWIR", "WIRTPPAYR", "TTVVRRRDR", "N +VNMGLKIR"], ["CSPHHTALR", "HISCLTFGR", "LVSFGVWIR", "WIRTPPAYR", "NTNMGLKIR", "T +LPETTVVR", "WGMDIDPYK", "YVNTNMGLK"], ]

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
    In the absence of evidence, opinion is indistinguishable from prejudice.

      Thank you BrowserUK. I suspect I have rather poorly explained my problem. I have tried to clarify with my response to Rolf elsewhere in this node

        I'm pretty sure that my code will do what you want -- order a small array by the ordering contained in a hash.

        My misunderstanding was that I thought you meant you had many arrays in a single run, rather than one array per run.

        However, the basic process of sorting an array according to a frequency hash remains the same:

        my %freq = ...; my @array = qw[ data items here ]; my @ordered = sort{ $freq{ $b } <=> $freq{ $a } } @array; ## for highe +st frequency first.

        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
        In the absence of evidence, opinion is indistinguishable from prejudice.

      Thanks for the tie-break, that works really well. Much appreciated

Re: Sort array using a ranking system from separate hash
by Marshall (Canon) on Aug 08, 2016 at 00:43 UTC
    Your example doesn't seem to be a good one because the histogram for most is "1". Is this what you are intending?
    #!/usr/bin/perl use warnings; use strict; use Data::Dumper; my @AoA =( [qw(TTVVRRRDR KFRQLLWFH RSQSPRRRR CSPHHTALR WIRTPPAYR VVRRR +DRGR HISCLTFGR RTPSPRRRR NTNMGLKFR LVSFGVWIR)], [qw(AILCWGELM TVLEYLVSF GLKFRQLLW LSFLPSDFF LLSFLPSDF LLWFH +ISCL MQLFHLCLI QLFHLCLII ATVELLSFL ALRQAILCW)], [qw(TTVVRRRDR CSPHHTALR WIRTPPAYR NVNMGLKIR LVSFGVWIR HISCL +TFGR)], [qw(CSPHHTALR YVNTNMGLK WIRTPPAYR TLPETTVVR HISCLTFGR NTNMG +LKIR WGMDIDPYK LVSFGVWIR)], ); #calculate histogram my %histo; foreach my $rowref (@AoA) { foreach my $elem (@$rowref) { $histo{$elem}++; } } print Dumper \%histo; #sort each array according to the histogram foreach my $rowref (@AoA) { @$rowref = sort{my $A = $histo{$a}; my $B = $histo{$b}; $A <=> $B} @$rowref; #edit guess should be $B<=> +$A } print Dumper \@AoA; __END__ $VAR1 = { 'WGMDIDPYK' => 1, 'YVNTNMGLK' => 1, 'MQLFHLCLI' => 1, 'KFRQLLWFH' => 1, 'HISCLTFGR' => 3, 'NTNMGLKIR' => 1, 'LVSFGVWIR' => 3, 'QLFHLCLII' => 1, 'VVRRRDRGR' => 1, 'WIRTPPAYR' => 3, 'LLSFLPSDF' => 1, 'TVLEYLVSF' => 1, 'CSPHHTALR' => 3, 'NTNMGLKFR' => 1, 'LLWFHISCL' => 1, 'LSFLPSDFF' => 1, 'AILCWGELM' => 1, 'TTVVRRRDR' => 2, 'NVNMGLKIR' => 1, 'RSQSPRRRR' => 1, 'ALRQAILCW' => 1, 'GLKFRQLLW' => 1, 'ATVELLSFL' => 1, 'TLPETTVVR' => 1, 'RTPSPRRRR' => 1 }; $VAR1 = [ [ 'KFRQLLWFH', 'RSQSPRRRR', 'VVRRRDRGR', 'RTPSPRRRR', 'NTNMGLKFR', 'TTVVRRRDR', 'CSPHHTALR', 'WIRTPPAYR', 'HISCLTFGR', 'LVSFGVWIR' ], [ 'AILCWGELM', 'TVLEYLVSF', 'GLKFRQLLW', 'LSFLPSDFF', 'LLSFLPSDF', 'LLWFHISCL', 'MQLFHLCLI', 'QLFHLCLII', 'ATVELLSFL', 'ALRQAILCW' ], [ 'NVNMGLKIR', 'TTVVRRRDR', 'CSPHHTALR', 'WIRTPPAYR', 'LVSFGVWIR', 'HISCLTFGR' ], [ 'YVNTNMGLK', 'TLPETTVVR', 'NTNMGLKIR', 'WGMDIDPYK', 'CSPHHTALR', 'WIRTPPAYR', 'HISCLTFGR', 'LVSFGVWIR' ] ];

      Thank you Marshall. I suspect I have rather poorly explained my problem. I have tried to clarify with my response to Rolf above