benizi has asked for the wisdom of the Perl Monks concerning the following question:

Fellow monks,

Does anyone know an easy way to alter regular expressions programmatically?

Ideally, I'd like to find a way to alter the kind of tree that use re qw/debug/; produces. All of this in order to let users input their own regular expressions that a program could alter in a 'nice' way. (removing 'EVAL' nodes, for example)

The types of regular expressions we'll use are likely simple enough that we'll end up using text substitution, but it'd be nice to have something around for RE manipulation in general.

Replies are listed 'Best First'.
Re: Altering regular expression trees
by diotalevi (Canon) on Jul 29, 2003 at 21:30 UTC

    There are two general purpose solutions to this. japhy's YAPE::Regex will parse the string representation of your regular expression and return an element tree. You could see about manipulating that or getting your info from that somehow. The thing is though - its parsing regexes which means it isn't pulling directly from perl's actual parsed regex. For that you need MJD's Rx module. The issue with is that when I last looked, it required a patch to the perl interpreter (also supplied). If you need a high fidelity copy of your regex and altering your interpreter is not impossible then this is likely the best route.

    Alternatively... you could attempt to capture the output from 'use re "debug"'. I asked about this once at Trapping re 'debug' and all I can come up with is if you throw a single instance of perl interpreter at a regex somewhat like `perl -Mre=debug -e 'qr/.../' 1> /dev/null 2> re.debug.output.txt` and then capture the output from STDERR from that. I've never been able to capture this data from within a perl process - not through redirecting STDERR, tying it or anything.

      You could try trapping the output using IPC::Open3. I do this for various deparse things, something like this will get the data for you. There is obviously neater ways to handle the data coming in, but for this example it works.
      use strict; use IPC::Open3; + + my $debug = open3(\*WRITE, \*READ, \*ERROR, "perl -Mre=debug -e 'qr/.. +./'"); #old incorrect stuff :) #while (<READ>||<ERROR>) { # my (@read, @error) = (<READ>, <ERROR>); # print "READ: @read\n"; # print "ERROR: @error\n"; #} print <ERROR>; close(\*WRITE, \*READ, \*ERROR);
      If doing the above is 'naughty' in any way please tell me, I have no idea of the plethora of caveats it possibly entails :)

      UPDATE: Thanks diotalevi, I overlooked that. If you take out the while loop and just simply print <ERROR>; it will show what it is meant to for the intended purpose, I had other uses on my mind at the time, one of which was obviously to stuff it up :) Have ammended code as such.

        I've never used IPC::OpenFoo before and can't comment. What I do notice is that in your while() loop you call readline() on either *READ alone or also *ERROR and then follow up by slurping th rest of each into the @read array, @error will always be empty. Fix the code and you've probably got something there.

        my @read = <READ>; my @error = <ERROR>; close READ; close WRITE; close ERROR;
Re: Altering regular expression trees
by artist (Parson) on Jul 29, 2003 at 21:24 UTC
    A sample for complicated REs would be helpful here.
Re: Altering regular expression trees
by tzz (Monk) on Jul 31, 2003 at 14:17 UTC
    While Perl's regular expressions are generally hard to access and modify on the level you require, the Parse::RecDescent module can probably do the job you require pretty well. You can produce a parse tree with P::RD and then evaluate it, skipping the nodes you need to avoid. You can also have actions associated with "good" nodes that execute them, and actions associated with "bad" nodes that print a warning, die, whatever is appropriate. In P::RD nodes are called "rules" and you should be able to pick up basic usage of the module in just a few minutes. It is possible to actually modify rules while the parser is running, but chances are you'll be happy with a parse tree.
      While I've found P::RD to be very cool and useful in the past (parsing an old machine-readable dictionary with embedded type-setting codes), I've never used it for anything as complicated as perl regular expressions. I'd imagine that would be a non-trivial task.

      For now, I'm going with japhy's YAPE::Regex, as suggested by diotalevi. I like the design, and it allows you to do most of the things tzz describes above, without having to create a regular expression grammar.

        Actually, that's the point. YAPE::Regex has a grammar in it. You'd just be reimplementing Y::R if you wrote your own grammar.

        I was unclear. I didn't mean you should try to parse generic Perl regular expressions with P::RD, but instead that you can give your users a simpler syntax that they may like better. Your original message implied you don't need exact compatibility with the Perl regex syntax. With P::RD you can design a syntax that matches what the users need, rather than trying to teach them Perl regular expressions. I've found the latter to be a difficult battle for most users.