in reply to Re: Sorting issue
in thread Sorting issue

Hi Aaron,

Thanks for the reply. I tried your suggestion, but it was not working as I found that there was repeats in the second column (frequency) of my input file. So, I changed the format of first column of output by concatenating the tags to the end. In that way, it looks unique. Now, I would like to sort on the first column of my output. I did create another hash to make it work. So far it is not successful. The codes that I changed are below. Any suggestions will be helpful

while (my $line=<$FILE1>) { chomp $line; $line=~s/\t/,/g; my @columns=split(/,/, $line); my $tags=$columns[0]; #$tag{$columns[0]}=\@columns; $tag{$tags}=$line; } foreach my $tags (keys %tag){ my $header; my $range=500000; my @columns=split(/,/,$tag{$tags}); $tags=$columns[0]; my $freq=$columns[1]; my $random_number=int(rand($range)); $header=">HWTI_".$freq."_".$random_number.$tags; $header=~tr/"//d; my $printline=$tag{$tags}; $printline=$header.",".$printline; print $FILE2 "$printline\n"; }

Replies are listed 'Best First'.
Re^3: Sorting issue
by aaron_baugher (Curate) on Nov 05, 2011 at 02:54 UTC

    You didn't show how you tried my suggestions, so I'm not sure why it didn't work for you. Here's a more complete example, which takes your sample input and sorts it by the frequencies (largest to smallest), outputting with a header built to your latest spec. Make sure you understand what's going on in the sort {block}: what $tags{$a} means, for instance. I'm sorting on the values, not the keys. The keys go into $a and $b, and I'm using those as keys into the hash to sort on the values.

    #!/usr/bin/perl use warnings; use strict; my %tags; # hash to store tags/freqs while(<DATA>){ chomp; my($tag, $freq) = split; # split the line on whitespace $tags{$tag} = $freq; # save the tag and freq in the hash } # sort the hash numerically on its values, descending for my $tag ( sort { $tags{$b} <=> $tags{$a} } keys %tags ){ my $freq = $tags{$tag}; # put the freq for $tag in $freq my $header = make_header($tag, $freq); # make the header print ">$header\t$tag\t$freq\n"; # print it out } sub make_header { my $tag = shift; # get parameters my $freq = shift; my $r = int(rand(500000)); # pick a random number return "HWTI_${freq}_$r$tag"; # build the header } #input data __DATA__ CCCDEDFFFES 45 EEBBBBGGGBB 1700 BBBCDDERFGG 850
    #output >HWTI_1700_494932EEBBBBGGGBB EEBBBBGGGBB 1700 >HWTI_850_10814BBBCDDERFGG BBBCDDERFGG 850 >HWTI_45_187939CCCDEDFFFES CCCDEDFFFES 45
      There is no need for the sort and print loop after the while loop. His input file is already sorted by frequency in descending order (your sample data would be sorted in descending order). So, the make_header() call and print routine could be done within the while loop.
        True, his sample data was already sorted that way; but it was only three lines, and he said he still needed to sort on that, so I assumed that was coincidence.
      Hi Aaron,

      Thanks! It worked with:

      split(' ', $line).

        Glad that helped. If you're having trouble using $line in place of $_, that's probably because if you give split a variable, you have to give it a pattern first. So you could do split ' ', $line; to use split's special case where giving it a single space as the pattern argument makes it split on whitespace like it does when you give it no arguments.