in reply to How to get non-redundant DNA sequences from a FASTA file?
#!/usr/bin/perl use warnings; use strict; my $fasta = << '__FASTA__'; >gi1 cds ATG fun >gi2 cds ATG fun >gi3 cds GGG fun __FASTA__ my @seq_with_hdr = split /\n>/, $fasta; $seq_with_hdr[0] =~ s/^>//; my %hdr_by_seq; for (@seq_with_hdr) { my ($hdr, $seq) = split /\n/; $hdr_by_seq{$seq} = $hdr; } for my $seq (keys %hdr_by_seq) { print ">$hdr_by_seq{$seq}\n$seq\n" }
Note that whitespace is not ignored in the data. There was a space after one of "ATG FUN" sequences which makes it different to the same sequence without the trailing space. I removed the space in my code.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: How to get non-redundant DNA sequences from a FASTA file?
by supriyoch_2008 (Monk) on Sep 13, 2014 at 12:48 UTC |