Your regex uses the alternation operator | to match one of multiple patterns, so for each line, only one of the five capture groups will capture something, the others will be undef, which is why you're getting that warning. Here's one way to get closer to what you want:
while (defined( my $line = <UNIPROT> )) { if ($line =~ /^(AC|OS|OX|ID|GN)\s+(.*)/) { print "<$1> $2\n"; } } __END__ <ID> ARF1_PLAFA Reviewed; 181 AA. <AC> Q94650; O02502; O02593; <GN> Name=ARF1; Synonyms=ARF, PLARF; <OS> Plasmodium falciparum. <OX> NCBI_TaxID=5833;
Since the file is processed line-by-line, I've renamed your variable from $lines to $line. If I were writing this code, here's how I might have written it:
#!/usr/bin/env perl use warnings; use strict; my $filename = "uniprotfile"; open my $ufh, "<", $filename or die "open $filename: $!"; while (<$ufh>) { chomp; my ($id,$content) = /^(AC|OS|OX|ID|GN)\s+(.*)/ or next; if ($id eq 'AC') { my ($first) = $content=~/^([^;]+)/ or die "couldn't parse '$content'"; print "AC: $first\n"; ... } elsif ($id eq 'OS') { ... } ... }
In reply to Re: Converting Uniprot File to a Fasta File in Perl
by haukex
in thread Converting Uniprot File to a Fasta File in Perl
by pearllearner315
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |