in reply to Converting Uniprot File to a Fasta File in Perl
I was considering posting another example of a parser, but having researched the question a bit more, I can only second what erix wrote!
Also, my understanding is that there are already existing modules to handle these things, such as BioPerl, which appears to have multiple existing parsers for this format, such as Bio::SeqIO::swiss. This is my first time trying out BioPerl so probably there's an even better way, but this seems to work:
use warnings; use strict; use HTTP::Tiny; use Bio::SeqIO; my $filename = '/tmp/Q94650.txt'; HTTP::Tiny->new->mirror( 'http://www.uniprot.org/uniprot/Q94650.txt', $filename)->{success} or die "Failed to fetch"; my $stream = Bio::SeqIO->new( -file => $filename, -format => 'swiss'); while ( my $seq = $stream->next_seq() ) { # just some examples... print $seq->accession_number, "\n"; print $seq->species->species, "\n"; my ($gene) = $seq->annotation->get_Annotations('gene_name'); print $gene->findval('Name'), "\n"; }
Or using the included bp_seqconvert utility:
$ wget -q http://www.uniprot.org/uniprot/Q94650.txt $ bp_seqconvert.pl --from swiss --to fasta <Q94650.txt >ARF1_PLAFA RecName: Full=ADP-ribosylation factor 1; Short=plARF; MGLYVSRLFNRLFQKKDVRILMVGLDAAGKTTILYKVKLGEVVTTIPTIGFNVETVEFRN ISFTVWDVGGQDKIRPLWRHYYSNTDGLIFVVDSNDRERIDDAREELHRMINEEELKDAI ILVFANKQDLPNAMSAAEVTEKLHLNTIRERNWFIQSTCATRGDGLYEGFDWLTTHLNNA K
|
|---|