Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Combining Regex

by neversaint (Deacon)
on Jul 23, 2013 at 09:05 UTC ( [id://1045801]=perlquestion: print w/replies, xml ) Need Help??

neversaint has asked for the wisdom of the Perl Monks concerning the following question:

Dear Masters,
I want a single regex that match these lines except the last one where it contains <EXP-N\d+>.
<MIR-1><EXP-V-3><VACCVIRUS-PROP-1> <MIR-1><ASSC-PHRASE-1><VACCVIRUS-PROP-1><PATTERN-1> <MIR-1><EXP-V-0><ART-0><VACCVIRUS-PROP-1> <MIR-1><EXP-V-0><ART-0><BE-V><VACCVIRUS-PROP-1> <MIR-1><EXP-V-0><EXP-N-0><VACCVIRUS-PROP-1>
I am stuck with this code (it also reflects the core pattern and it's ordering desired for the match)
<MIR-\d+>(?:<EXP-V-\d+>|<ASSC-PHRASE-\d+>).+<VACCVIRUS-PROP-\d+>.+
http://rubular.com/r/Z5sZ0nv7n1

What's the right way to do it?

---
neversaint and everlastingly indebted.......

Replies are listed 'Best First'.
Re: Combining Regex
by BrowserUk (Patriarch) on Jul 23, 2013 at 09:30 UTC

    @l = ...; m[ <MIR-\d+> (?:<EXP-V-\d+>)? (?:<ASSC-PHRASE-\d+>|<ART-\d+>)? (?:<BE-V>)? <VACCVIRUS-PROP-\d+> (?:<PATTERN-\d+>)? ]x and print for @l;; <MIR-1><EXP-V-3><VACCVIRUS-PROP-1> <MIR-1><ASSC-PHRASE-1><VACCVIRUS-PROP-1><PATTERN-1> <MIR-1><EXP-V-0><ART-0><VACCVIRUS-PROP-1> <MIR-1><EXP-V-0><ART-0><BE-V><VACCVIRUS-PROP-1>

    Or more simply:

    $_ !~ m[<EXP-N-\d+>] and print for @l;; <MIR-1><EXP-V-3><VACCVIRUS-PROP-1> <MIR-1><ASSC-PHRASE-1><VACCVIRUS-PROP-1><PATTERN-1> <MIR-1><EXP-V-0><ART-0><VACCVIRUS-PROP-1> <MIR-1><EXP-V-0><ART-0><BE-V><VACCVIRUS-PROP-1>

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Combining Regex
by tobyink (Canon) on Jul 23, 2013 at 09:30 UTC

    Is there something wrong with the answer you received here?

    package Cow { use Moo; has name => (is => 'lazy', default => sub { 'Mooington' }) } say Cow->new->name
Re: Combining Regex
by Happy-the-monk (Canon) on Jul 23, 2013 at 09:32 UTC

    .+

    Both your ".+" parts of the regex ask to match something where in your example data there isn't anything. Take 'em out.

    Cheers, Sören

    Créateur des bugs mobiles - let loose once, run everywhere.
    (hooked on the Perl Programming language)

Re: Combining Regex
by Loops (Curate) on Jul 23, 2013 at 09:33 UTC

    Okay you changed the question a few times while I was composing this reply ;o). Have to say i'm still left guessing what the ordering rules are for the angle bracket segments. I picked one ordering that gives the results you're requesting, but you may still have to tweak them a bit. The main idea is to ignore whitespace in the regex using the /x parameter so that you can format the regex for readability:

    use strict; use warnings; while (<DATA>) { print if / <MIR-\d+> ( <EXP-V-\d+> (<ART-\d+>)* (<BE-V>)* | <ASSC-PHRASE-\d+> ) <VACCVIRUS-PROP-\d+>/x; } __DATA__ <MIR-1><EXP-V-3><VACCVIRUS-PROP-1> <MIR-1><ASSC-PHRASE-1><VACCVIRUS-PROP-1><PATTERN-1> <MIR-1><EXP-V-0><ART-0><VACCVIRUS-PROP-1> <MIR-1><EXP-V-0><ART-0><BE-V><VACCVIRUS-PROP-1> <MIR-1><EXP-V-0><EXP-N-0><VACCVIRUS-PROP-1>
      Hi. Thanks. The pattern
      <ART-\d+> <BE-V>)
      is optional, and it can be anything. The core pattern is
      <MIR-\d+> <EXP-V-\d+>|<ASSC-PHRASE-\d+> <VACCVIRUS-PROP-\d+>
      ---
      neversaint and everlastingly indebted.......
Re: Combining Regex
by AnomalousMonk (Archbishop) on Jul 23, 2013 at 14:33 UTC

    Your OP node title specifically refers to combining regexes, so here's an approach that decomposes what seem to be the essential elements of your regex and re-combines them to form the final matching regex. I find a decompositional approach makes it easier to think about a regex (especially a complex one) when writing it, and to maintain it later. (Note: Some of your StackOverflow examples have leading characters before the  $mir pattern. If this is really the case, eliminate the  \A absolute-beginning-of-string anchor from the matching regex.)

    Another Note: If it's just a matter of excluding anything matching  <EXP-N-\d+> then BrowserUk's 'simpler' solution here is by far the best.

    >perl -wMstrict -le "my @strs = qw( <MIR-1><EXP-V-3><VACCVIRUS-PROP-1> <MIR-1><ASSC-PHRASE-1><VACCVIRUS-PROP-1><PATTERN-1> <MIR-1><EXP-V-0><ART-0><VACCVIRUS-PROP-1> <MIR-1><EXP-V-0><ART-0><BE-V><VACCVIRUS-PROP-1> <MIR-1><EXP-V-0><EXP-N-0><VACCVIRUS-PROP-1> ); ;; my $tail = qr{ \d+ > }xms; ;; my $mir = qr{ < MIR- $tail }xms; my $exp_v = qr{ < EXP-V- $tail }xms; my $exp_n = qr{ < EXP-N- $tail }xms; my $assc_phr = qr{ < ASSC-PHRASE- $tail }xms; my $vaccvir = qr{ < VACCVIRUS-PROP- $tail }xms; ;; for my $str (@strs) { print qq{'$str'} if $str =~ m{ \A $mir (?: $exp_v (?! $exp_n) | $assc_phr ) .*? $vaccvir }xms; } " '<MIR-1><EXP-V-3><VACCVIRUS-PROP-1>' '<MIR-1><ASSC-PHRASE-1><VACCVIRUS-PROP-1><PATTERN-1>' '<MIR-1><EXP-V-0><ART-0><VACCVIRUS-PROP-1>' '<MIR-1><EXP-V-0><ART-0><BE-V><VACCVIRUS-PROP-1>'

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1045801]
Approved by hdb
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having a coffee break in the Monastery: (4)
As of 2024-03-29 15:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found