in reply to Help to build a REGEXP

I'm assuming $line7 contains the excessive amount of data that you've posted. In the script below, I've used a representative sample. For future posts, please do the same.

You haven't shown how you've extracted that data. Ensure $line7 actually contains the data you think it does (i.e. print "$line7\n";).

In the script below, I've simply captured everything that isn't a double-quote between '/translation="' and '"' then removed all the extraneous whitespace.

#!/usr/bin/env perl -l use strict; use warnings; my $line7 = ' ... /db_xref="GI:2735715" /translation="MLSFVDTRTLLLLAVTLCLATCQSLQEETVRKGPA +GDRGPRGER GPPGPPGRDGEDGPTGPPGPPGPPGPPGLGGNFAAQYDGKGVGLGPGPM +GLMGPRGPP YASQNITYHCKNSIAYMDEETGNLKKAVILQGSNDVELVAEGNSRFTYT +VLVDGCSKK TNEWGKTIIEYKTNKPSRLPFLDIAPLDIGGADHEFFVDIGPVCFK" exon 2432..2501 ... '; my $re = qr{/translation="([^"]+)"}; my ($extract) = $line7 =~ $re; $extract =~ s/\s+//g; print $extract;

Output:

MLSFVDTRTLLLLAVTLCLATCQSLQEETVRKGPAGDRGPRGERGPPGPPGRDGEDGPTGPPGPPGPPGP +PGLGGNFAAQYDGKGVGLGPGPMGLMGPRGPPYASQNITYHCKNSIAYMDEETGNLKKAVILQGSNDVE +LVAEGNSRFTYTVLVDGCSKKTNEWGKTIIEYKTNKPSRLPFLDIAPLDIGGADHEFFVDIGPVCFK

Update: From looking at other posts in this thread, it would seem possible that your initial problem (i.e. before you even start performing any matching) could be extracting the data you want. If that's the case, open a filehandle to your data file and populate $line7 as I've shown below. As you'll see, once you've done that, the rest of the code hasn't changed and the output is identical.

By the way, is there some significance to the $line7 variable name? If not, I'd pick something more meaningful.

#!/usr/bin/env perl -l use strict; use warnings; my $line7 = ''; my $re = qr{/translation="([^"]+)"}; while (<DATA>) { if (/^\s+\/translation=/ .. /^\s+exon/) { $line7 .= $_; } else { $line7 ? last : next; } } my ($extract) = $line7 =~ $re; $extract =~ s/\s+//g; print $extract; __DATA__ ... /db_xref="GI:2735715" /translation="MLSFVDTRTLLLLAVTLCLATCQSLQEETVRKGPA +GDRGPRGER GPPGPPGRDGEDGPTGPPGPPGPPGPPGLGGNFAAQYDGKGVGLGPGPM +GLMGPRGPP YASQNITYHCKNSIAYMDEETGNLKKAVILQGSNDVELVAEGNSRFTYT +VLVDGCSKK TNEWGKTIIEYKTNKPSRLPFLDIAPLDIGGADHEFFVDIGPVCFK" exon 2432..2501 ...

Output:

MLSFVDTRTLLLLAVTLCLATCQSLQEETVRKGPAGDRGPRGERGPPGPPGRDGEDGPTGPPGPPGPPGP +PGLGGNFAAQYDGKGVGLGPGPMGLMGPRGPPYASQNITYHCKNSIAYMDEETGNLKKAVILQGSNDVE +LVAEGNSRFTYTVLVDGCSKKTNEWGKTIIEYKTNKPSRLPFLDIAPLDIGGADHEFFVDIGPVCFK

-- Ken

Replies are listed 'Best First'.
Re^2: Help to build a REGEXP
by Anonymous Monk on Mar 12, 2014 at 10:30 UTC
    Hi, I tried:
    if($_=~ m/^\s+\/translation\=\"(.*?)\"/ms) { $wanted_part=$1; }

    but got nothing! But why doesn't it work?
        I can easily match the wanted part, if I slurp the whole entry in one variable, using the $/ operator...
        But problem is that the teacher is a bit weird and thinks these are "non-pedagogical stuff"...
        Inside $_ it's only each current line of the entry, nothing more...

      It doesn't work because your regex doesn't match whatever is in $_. Of course, as you've refused to advise us what $_ contains, you can't possibly expect any further information on what was happening in that isolated code fragment.

      I provided you with a solution. Your response says you tried something completely different. Why did you reply to my post telling me that?

      Did you try my solution? Did it do what you wanted? If not, what did it do differently? Was it unsuitable for your class exercise? If so, in what way was it unsuitable?

      You've failed to tell us what data you're actually trying to match against: first with $line7 and more recently with $_. Why? I even gave you the specific code in my earlier response " (i.e. print "$line7\n";)". Did you do this? If you did, what was the output? If you didn't, why not?

      You've received a lot of advice from people who've freely given their time to try to help you. I think its about time you put in some effort yourself: answer questions, provide output, try solutions and so on.

      -- Ken