in reply to Help to build a REGEXP
I'm assuming $line7 contains the excessive amount of data that you've posted. In the script below, I've used a representative sample. For future posts, please do the same.
You haven't shown how you've extracted that data. Ensure $line7 actually contains the data you think it does (i.e. print "$line7\n";).
In the script below, I've simply captured everything that isn't a double-quote between '/translation="' and '"' then removed all the extraneous whitespace.
#!/usr/bin/env perl -l use strict; use warnings; my $line7 = ' ... /db_xref="GI:2735715" /translation="MLSFVDTRTLLLLAVTLCLATCQSLQEETVRKGPA +GDRGPRGER GPPGPPGRDGEDGPTGPPGPPGPPGPPGLGGNFAAQYDGKGVGLGPGPM +GLMGPRGPP YASQNITYHCKNSIAYMDEETGNLKKAVILQGSNDVELVAEGNSRFTYT +VLVDGCSKK TNEWGKTIIEYKTNKPSRLPFLDIAPLDIGGADHEFFVDIGPVCFK" exon 2432..2501 ... '; my $re = qr{/translation="([^"]+)"}; my ($extract) = $line7 =~ $re; $extract =~ s/\s+//g; print $extract;
Output:
MLSFVDTRTLLLLAVTLCLATCQSLQEETVRKGPAGDRGPRGERGPPGPPGRDGEDGPTGPPGPPGPPGP +PGLGGNFAAQYDGKGVGLGPGPMGLMGPRGPPYASQNITYHCKNSIAYMDEETGNLKKAVILQGSNDVE +LVAEGNSRFTYTVLVDGCSKKTNEWGKTIIEYKTNKPSRLPFLDIAPLDIGGADHEFFVDIGPVCFK
Update: From looking at other posts in this thread, it would seem possible that your initial problem (i.e. before you even start performing any matching) could be extracting the data you want. If that's the case, open a filehandle to your data file and populate $line7 as I've shown below. As you'll see, once you've done that, the rest of the code hasn't changed and the output is identical.
By the way, is there some significance to the $line7 variable name? If not, I'd pick something more meaningful.
#!/usr/bin/env perl -l use strict; use warnings; my $line7 = ''; my $re = qr{/translation="([^"]+)"}; while (<DATA>) { if (/^\s+\/translation=/ .. /^\s+exon/) { $line7 .= $_; } else { $line7 ? last : next; } } my ($extract) = $line7 =~ $re; $extract =~ s/\s+//g; print $extract; __DATA__ ... /db_xref="GI:2735715" /translation="MLSFVDTRTLLLLAVTLCLATCQSLQEETVRKGPA +GDRGPRGER GPPGPPGRDGEDGPTGPPGPPGPPGPPGLGGNFAAQYDGKGVGLGPGPM +GLMGPRGPP YASQNITYHCKNSIAYMDEETGNLKKAVILQGSNDVELVAEGNSRFTYT +VLVDGCSKK TNEWGKTIIEYKTNKPSRLPFLDIAPLDIGGADHEFFVDIGPVCFK" exon 2432..2501 ...
Output:
MLSFVDTRTLLLLAVTLCLATCQSLQEETVRKGPAGDRGPRGERGPPGPPGRDGEDGPTGPPGPPGPPGP +PGLGGNFAAQYDGKGVGLGPGPMGLMGPRGPPYASQNITYHCKNSIAYMDEETGNLKKAVILQGSNDVE +LVAEGNSRFTYTVLVDGCSKKTNEWGKTIIEYKTNKPSRLPFLDIAPLDIGGADHEFFVDIGPVCFK
-- Ken
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Help to build a REGEXP
by Anonymous Monk on Mar 12, 2014 at 10:30 UTC | |
by Anonymous Monk on Mar 12, 2014 at 10:57 UTC | |
by Anonymous Monk on Mar 12, 2014 at 11:15 UTC | |
by Anonymous Monk on Mar 12, 2014 at 11:46 UTC | |
by kcott (Archbishop) on Mar 12, 2014 at 20:18 UTC |