in reply to concatenating identical sequences
Alternatively, if you can't afford to fit the entire file in memory, you could still use this technique by storing the file offset and not the actual sequence. This will require more IO with tell and seek but should allow the same simplicity in the code.my %data; while (my $rec = fetch_record($fh)) { my $id = $rec->{id}; push @{$data{$id}}, $rec->{sequence}; } for my $id (keys %data) { print "$id "; print "$_\n" for @{$data{$id}}; }
One last alternative would be to re-write the file merging all the rows for a record on one line. Next, sort the file so duplicate IDs are adjacent and then it should be straight forward to merge them. Since it appears each row is fixed length, recreating the original structure from a single line should be straight forward.
Cheers - L~R
|
|---|