Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Re: Regex AND

by ady (Deacon)
on Dec 02, 2004 at 14:16 UTC ( [id://411761]=note: print w/replies, xml ) Need Help??


in reply to Regex AND

A little more background on the domain of this problem:

I've written a tool (in Perl) for transforming data on enterprise applications (modules & relations) to an input format for graphic display (nodes & arcs).

The node names have the general format:

[A-Z]{2}\d{5}[A-Z]?

Part of the tool allows you to enter a regex (in a textbox), the program compiles the regex and uses it as a filter to parse the data (eg. discard data line if node-name !~ node-filter).

For instance you can specify the following regex:

(CX36(5|6))|(JA30[0-2])|(JA3(([2-8]\d)|(9[0-4])))|(JA5.*)|(JA6((0\d)|( +1[0-3])))|(JA64[7-9])|(JA687.*)|(JA74[0-3])|(JB5.*)|(JY(((1|2)\d\d)|( +3[0-3]\d)))|(JY[3-9][5-9]\d)|(JZ51(3|4)00.*)
to indicate that you're only interested in source modules matching the following name conventions (which is an example of an actual application domain) :
CX365-CX366 JA300-JA302 JA320-JA394 JA5* JA600-JA613 JA647-JA649 JA687* JA740-JA743 JB5* JY100-JY339 JY350-JY999 JZ51300* JZ51400*
Now it's also often relevant to filter on nodes NOT matching a given application domain (in effect the complement of the domain definition), - for the above example all modules which pass a filter combining the following regex'es:
^(?!CX36(5|6)) ^(?!JA30[0-2]) ^(?!JA3(([2-8]\d)|(9[0-4]))) ^(?!JA5.*) ^(?!(JA6((0\d)|(1[0-3])))) ^(?!JA64[7-9]) ^(?!JA687.*) ^(?!JA74[0-3]) ^(?!JB5.*) ^(?!(JY(((1|2)\d\d)|(3[0-3]\d)))) ^(?!JY[3-9][5-9]\d) ^(?!JZ51(3|4)00.*)
Thus the need to combine (AND) the "negated" rexeg'es into one big regx and pass that to the parsing/filtering program.

Allan

Replies are listed 'Best First'.
Re^2: Regex AND
by melora (Scribe) on Dec 02, 2004 at 16:23 UTC
    Why can't you negate the first regex to capture all those which don't match? I am assuming that my question is stupid, so please have patience with me. Is the problem that the second regex may be different from the negation of the first?
      Well, i'd have to open the perl program and change the !~ op to the =~ op each time i want filtering on a "negated domain".

      I could do that, but i prefer a way to express the regex complement directly as a new regex (to be fed to the program). -- And the way to do that was shown by Corion above.

      Best regards / allan

      ... then again, yes i could modify the GUI with a checkbox indicating "straight/negated", and switch the perl comparison operator accordingly. In the end i guess i was intrigued by the "how to climb it", as a regex...

        (Update: See Re: Ways to implement a closure for more on using closures for this kind of thing.)

        ady wrote:

        Well, i'd have to open the perl program and change the !~ op to the =~ op each time i want filtering on a "negated domain".

        I could do that, but i prefer a way to express the regex complement directly as a new regex (to be fed to the program). -- And the way to do that was shown by Corion above.

        Another option would be use "regex matchers" instead of hand-coded regex operations. The matchers can be inverted, and so you can change the matching logic of your worker code by passing in normal or inverted matchers.

        One possible implementation:

        # The following small library lets us create regex-matchers # and inverted regex-matchers. sub make_regex_matcher { my $regex = shift; return sub { local $_ = $_[0]; /$regex/g; } } sub invert_regex_matcher { my $matcher = shift; sub { wantarray ? die "inverted matchers are only for scalar context" : ! $matcher->(@_) } }

        Then we can parameterize our code's matching behavior by using matchers instead of regex operators:

        # With the above library, we can write our worker code without # having to specifiy whether we are interested in matching (=~) # or non-matching (!~). Instead, we can parameterize this # behavior by allowing our worker to accept a matcher as an # argument: my @candidates = map {chomp;$_} <DATA>; sub do_work { my $matcher = shift; foreach (@candidates) { if ($matcher->($_)) { # instead of regex op # do something with candidate in $_ print "$_$/"; } } }

        Here is a sample run:

        # To demonstrate this approach, let us create a matcher for # your example pattern: my $matcher = make_regex_matcher('(CX36(5|6))|(JA30[0-2])|(JA3(([2 +-8]\d)|(9[0-4])))|(JA5.*)|(JA6((0\d)|(1[0-3])))|(JA64[7-9])|(JA687.*) +|(JA74[0-3])|(JB5.*)|(JY(((1|2)\d\d)|(3[0-3]\d)))|(JY[3-9][5-9]\d)|(J +Z51(3|4)00.*)'); # Now we can process matching candidates: print "Matches:$/"; do_work($matcher); # And we can process non-matching candidates without # having to change a line of worker code: print "$/Non-matches:$/"; do_work(invert_regex_matcher($matcher)); ### OUTPUT: ### ### Matches: ### CX365-CX366 ### JA300-JA302 ### JA320-JA394 ### ### Non-matches: ### I do not match! ### Nor do I match, my non-matching brother! __DATA__ CX365-CX366 I do not match! JA300-JA302 Nor do I match, my non-matching brother! JA320-JA394

        I hope that this helps.

        Cheers,
        Tom

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://411761]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (3)
As of 2024-04-20 02:58 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found