in reply to how to parse a UniProt Flat file
Your regex is using greedy matching, so your first match term is 'Name=ARF1; Synonyms'. You can make it less greedy using '+?'.
however:
This won't fix your problem because your format requires multiple passes per line, and you are only performing one. Perhaps something like this?
@lines = grep {/^DE|^GN|^ID/} split ("\n", $_); foreach $lines(@lines) { if ($lines =~ /^DE|^GN/ && $lines !~ /Putative uncharacterized pro +tein/) { while ($lines) { $lines =~ s/.+?\=(.+?)\;//; print lc($1)."\n"; } } elsif ($lines =~ /^ID/) { print " \n"; } }
|
|---|