Dear Monks,

I have some data that i want to sort, it is in a table format where the first column is an id, and the other columns contain information. Unfortunately, the id's are not always unique (there may be more than 1 row per id). I want to simply get rid of all redundant rows and keep only the first row for each id (the row nearest the top).

This what the data looks like, the id's in this case are 1374, 1374, and 1450:

1374:1-202 gb|AE000516.2| Mycobacterium tuberculosis CDC1551, com +plete genome 34.5 69 3.6 202 4403837 14/48 + 29% 25/48 52% 38 181 1895058 1895201 1374:1-202 gb|AE000516.2| Mycobacterium tuberculosis CDC1551, com +plete genome 34.1 68 5.0 2 1450:1-202 emb|BX248345.1| Mycobacterium bovis subsp. bovis AF212 +2/97 complete genome; segment 12/14 70.3 147 6e-11 202 30 +8050 28/59 47% 43/59 72% 17 193 1686 +81 168505
In this case i would want to get rid of the second row and keep only the 1st and 3rd rows.

I thought that this code would work but instead it is printing all rows, not just one row per id. Can anyone please help out??

Here is my code:

for (my $i=0; $i<@parsed_file; $i++) { my @record = $parsed_file[$i]; my $record = join ('', @record); @record = split (/\t/, $record); $num = $freq{$record[0]}{"freq"}++; $freq{$array[0]}{"value"}[$num] = $_; + my @id; push (@id, $record[0]); } + # i sort based on id to extract unique id's my @sorted_array = sort {$freq{$b}{"freq"} <=> $freq{$a}{"freq"}} keys + %freq; ##print "$sorted_array[0]\n"; + for (my $i=0; $i<@parsed_file; $i++) { my @hit = $parsed_file[$i]; my $hit = join ('', @record); @hit = split (/\t/, $record); my $c=0; my $id2 = $hit[0]; + foreach my $id (@sorted_array) { if ($id == $id2) { ++$c; } # try to match unique id's to the file and print the first instance fo +und, but it prints everything if ($c == 1) { print "$parsed_file[$i]\n"; } } }

In reply to sorting arrays by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.