comment on

Dear Monks,

I have some data that i want to sort, it is in a table format where the first column is an id, and the other columns contain information. Unfortunately, the id's are not always unique (there may be more than 1 row per id). I want to simply get rid of all redundant rows and keep only the first row for each id (the row nearest the top).

This what the data looks like, the id's in this case are 1374, 1374, and 1450:

1374:1-202      gb|AE000516.2| Mycobacterium tuberculosis CDC1551, com
+plete genome      34.5    69      3.6 202 4403837      14/48         
+  29%     25/48           52%     38 181 1895058 1895201
1374:1-202      gb|AE000516.2| Mycobacterium tuberculosis CDC1551, com
+plete genome      34.1    68      5.0 2
1450:1-202      emb|BX248345.1| Mycobacterium bovis subsp. bovis AF212
+2/97 complete genome; segment 12/14   70.3     147     6e-11   202 30
+8050      28/59           47%     43/59           72%     17 193 1686
+81 168505
[download]

In this case i would want to get rid of the second row and keep only the 1st and 3rd rows.

I thought that this code would work but instead it is printing all rows, not just one row per id. Can anyone please help out??

Here is my code:

for (my $i=0; $i<@parsed_file; $i++)    {
        my @record = $parsed_file[$i];
        my $record = join ('', @record);
        @record = split (/\t/, $record);
        $num = $freq{$record[0]}{"freq"}++;
        $freq{$array[0]}{"value"}[$num] = $_;
                                                                      
+                                      
        my @id;
        push (@id, $record[0]);
}
                                                                      
+                                      
# i sort based on id to extract unique id's

my @sorted_array = sort {$freq{$b}{"freq"} <=> $freq{$a}{"freq"}} keys
+ %freq;
##print "$sorted_array[0]\n";
                                                                      
+                                      
for (my $i=0; $i<@parsed_file; $i++)    {
        my @hit = $parsed_file[$i];
        my $hit = join ('', @record);
        @hit = split (/\t/, $record);
        my $c=0;
        my $id2 = $hit[0];
                                                                      
+                                      
        foreach my $id (@sorted_array)  {
                if ($id == $id2)        {
                        ++$c;
                       
                }
# try to match unique id's to the file and print the first instance fo
+und, but it prints everything
                if ($c == 1) {
                        print "$parsed_file[$i]\n";
                }
        }
}
[download]

In reply to sorting arrays by Anonymous Monk

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.