sorting arrays

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,

I have some data that i want to sort, it is in a table format where the first column is an id, and the other columns contain information. Unfortunately, the id's are not always unique (there may be more than 1 row per id). I want to simply get rid of all redundant rows and keep only the first row for each id (the row nearest the top).

This what the data looks like, the id's in this case are 1374, 1374, and 1450:

1374:1-202      gb|AE000516.2| Mycobacterium tuberculosis CDC1551, com
+plete genome      34.5    69      3.6 202 4403837      14/48         
+  29%     25/48           52%     38 181 1895058 1895201
1374:1-202      gb|AE000516.2| Mycobacterium tuberculosis CDC1551, com
+plete genome      34.1    68      5.0 2
1450:1-202      emb|BX248345.1| Mycobacterium bovis subsp. bovis AF212
+2/97 complete genome; segment 12/14   70.3     147     6e-11   202 30
+8050      28/59           47%     43/59           72%     17 193 1686
+81 168505
[download]

In this case i would want to get rid of the second row and keep only the 1st and 3rd rows.

I thought that this code would work but instead it is printing all rows, not just one row per id. Can anyone please help out??

Here is my code:

for (my $i=0; $i<@parsed_file; $i++)    {
        my @record = $parsed_file[$i];
        my $record = join ('', @record);
        @record = split (/\t/, $record);
        $num = $freq{$record[0]}{"freq"}++;
        $freq{$array[0]}{"value"}[$num] = $_;
                                                                      
+                                      
        my @id;
        push (@id, $record[0]);
}
                                                                      
+                                      
# i sort based on id to extract unique id's

my @sorted_array = sort {$freq{$b}{"freq"} <=> $freq{$a}{"freq"}} keys
+ %freq;
##print "$sorted_array[0]\n";
                                                                      
+                                      
for (my $i=0; $i<@parsed_file; $i++)    {
        my @hit = $parsed_file[$i];
        my $hit = join ('', @record);
        @hit = split (/\t/, $record);
        my $c=0;
        my $id2 = $hit[0];
                                                                      
+                                      
        foreach my $id (@sorted_array)  {
                if ($id == $id2)        {
                        ++$c;
                       
                }
# try to match unique id's to the file and print the first instance fo
+und, but it prints everything
                if ($c == 1) {
                        print "$parsed_file[$i]\n";
                }
        }
}
[download]

Comment on sorting arrays Select or Download Code

Replies are listed 'Best First'.
Re: sorting arrays by gam3 (Curate) on Apr 12, 2005 at 17:41 UTC
`%unique; @unique = grep({!$unique{$_->[0]}++} @parsed_file)` [download] -- gam3 A picture is worth a thousand words, but takes 200K.	[reply] [d/l]
Re: sorting arrays by sasikumar (Monk) on Apr 12, 2005 at 17:41 UTC
Hi Use a hashtable with the column 1 as your key value. before adding into the hashtable Chk for the the key if it exist do not overwrite it just ignore. Thanks SasiKumar	[reply]
Re: sorting arrays by gaal (Parson) on Apr 12, 2005 at 17:45 UTC
If this data fits easily in memory, go over each row and extract the id from it, then insert `($id => $original_line)` into a hash, if this id is new. Then print the hash, sorted by keys. `my %seen; for my $row (@data) { my ($id) =~ /^(\d+):/ or die "bad line: [$row]"; $seen{$id} \|\|= $row; } print $seen{$_}, "\n" for sort {$a<=>$b} keys %seen;` [download]	[reply] [d/l] [select]
Re: sorting arrays by tlm (Prior) on Apr 12, 2005 at 18:47 UTC
On Unix a simple alternative to perl is to use the system's `sort` command: `% sort -uk 1,1 datafile > sorted_datafile` [download] That will sort strictly (i.e. no duplicates) on the first field; the first instance encountered is the one kept. See `man sort` (or `info sort` if your system uses GNU's `sort`). the lowliest monk	[reply] [d/l] [select]
Re^2: sorting arrays by strictvars (Sexton) on Apr 12, 2005 at 20:39 UTC
Hash slices would be ok too. `my %hash; @hash(@array) = (@array); @sorted = keys %hash;` [download]	[reply] [d/l]