in reply to Sorting issue

You're splitting the line on commas (after changing tabs to commas, which is puzzling), then saving each line in a hash with the key being the first element from your split. So the hash key that you're sorting by is that first column. If you want to sort by something else, you have to tell the sort function that.

To make another column easily available to sort, and to avoid duplicating work you've already done, save @columns in your hash instead of the original line. Then you'll have a hash of arrays, so you can sort on whichever element of the array you'd like:

$tag{$columns[0]} = \@columns; } foreach my $tags ( sort { $tag{$a}[1] <=> $tag{$b}[1] } keys %tag ){

In this case, I'm using <=> to sort numerically, based on the second element of the array pointed to by each hash key's value. To sort alphabetically, change <=> to cmp. Now you can get your array back into @columns with the dereference @{$tag{$tags}}, so you don't have to re-split your line.

One concern: you said you're trying to come up with a unique key for each line, but you're using the first column alone as the key when you put them in the hash. If the values from the first column aren't already unique, you'll be overwriting values there, so lines will already be missing by the time you sort and start adding your other parts. If you need to add the frequency and a random number to get a unique key (and I have a feeling there's a better way to do that than with random numbers, which could repeat), you should do that before you save the key in your hash.

Replies are listed 'Best First'.
Re^2: Sorting issue
by bluray (Sexton) on Nov 05, 2011 at 01:45 UTC
    Hi Aaron,

    Thanks for the reply. I tried your suggestion, but it was not working as I found that there was repeats in the second column (frequency) of my input file. So, I changed the format of first column of output by concatenating the tags to the end. In that way, it looks unique. Now, I would like to sort on the first column of my output. I did create another hash to make it work. So far it is not successful. The codes that I changed are below. Any suggestions will be helpful

    while (my $line=<$FILE1>) { chomp $line; $line=~s/\t/,/g; my @columns=split(/,/, $line); my $tags=$columns[0]; #$tag{$columns[0]}=\@columns; $tag{$tags}=$line; } foreach my $tags (keys %tag){ my $header; my $range=500000; my @columns=split(/,/,$tag{$tags}); $tags=$columns[0]; my $freq=$columns[1]; my $random_number=int(rand($range)); $header=">HWTI_".$freq."_".$random_number.$tags; $header=~tr/"//d; my $printline=$tag{$tags}; $printline=$header.",".$printline; print $FILE2 "$printline\n"; }

      You didn't show how you tried my suggestions, so I'm not sure why it didn't work for you. Here's a more complete example, which takes your sample input and sorts it by the frequencies (largest to smallest), outputting with a header built to your latest spec. Make sure you understand what's going on in the sort {block}: what $tags{$a} means, for instance. I'm sorting on the values, not the keys. The keys go into $a and $b, and I'm using those as keys into the hash to sort on the values.

      #!/usr/bin/perl use warnings; use strict; my %tags; # hash to store tags/freqs while(<DATA>){ chomp; my($tag, $freq) = split; # split the line on whitespace $tags{$tag} = $freq; # save the tag and freq in the hash } # sort the hash numerically on its values, descending for my $tag ( sort { $tags{$b} <=> $tags{$a} } keys %tags ){ my $freq = $tags{$tag}; # put the freq for $tag in $freq my $header = make_header($tag, $freq); # make the header print ">$header\t$tag\t$freq\n"; # print it out } sub make_header { my $tag = shift; # get parameters my $freq = shift; my $r = int(rand(500000)); # pick a random number return "HWTI_${freq}_$r$tag"; # build the header } #input data __DATA__ CCCDEDFFFES 45 EEBBBBGGGBB 1700 BBBCDDERFGG 850
      #output >HWTI_1700_494932EEBBBBGGGBB EEBBBBGGGBB 1700 >HWTI_850_10814BBBCDDERFGG BBBCDDERFGG 850 >HWTI_45_187939CCCDEDFFFES CCCDEDFFFES 45
        There is no need for the sort and print loop after the while loop. His input file is already sorted by frequency in descending order (your sample data would be sorted in descending order). So, the make_header() call and print routine could be done within the while loop.
        Hi Aaron,

        Thanks! It worked with:

        split(' ', $line).