Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Regexp::Common not so common?

by iaw4 (Monk)
on Aug 14, 2008 at 13:28 UTC ( [id://704345]=perlquestion: print w/replies, xml ) Need Help??

iaw4 has asked for the wisdom of the Perl Monks concerning the following question:

for commoners, like myself. my problem is easier to show than to explain (perl 5.8.8):
#!/usr/bin/perl -w use strict; use Regexp::Common; my $teststring= "teststring: start hello {ab}{cd} end\n"; my $balancedparens= qr/\s*$RE{balanced}{-parens=>'{ }'}/; my $pattern1= $balancedparens . $balancedparens; my $pattern2= qr/hello/; my $pattern3= $pattern2.$pattern1; print "no error yet\n"; $teststring=~ s/$pattern3/hi$1/g; ## ERROR, WHY??? print $teststring;
can someone please explain to me how to avoid this (and not just suppress the error)?

Replies are listed 'Best First'.
Re: Regexp::Common not so common?
by Tanktalus (Canon) on Aug 14, 2008 at 17:25 UTC

    It'd be nice if you gave the output you were expecting ... what I got to work, without that ugly use re 'eval' workaround is:

    #! /usr/bin/perl -w use strict; use Regexp::Common; my $teststring= "teststring: start hello {ab}{cd} end\n"; my $balancedparens= qr/\s*$RE{balanced}{-parens=>'{ }'}/; my $pattern1= qr/($balancedparens)$balancedparens/; my $pattern2= qr/hello/; my $pattern3= qr/$pattern2$pattern1/; print "no error yet\n"; $teststring=~ s/$pattern3/hi$1/g; ## ERROR, WHY??? print $teststring;
    Note how I'm using the qr operator to combine the patterns instead of just using string concatenation. This allows the Regexp objects to be treated as Regexp objects instead of forcing them to switch back and forth between string representations and regular expression objects. I suspect that this also keeps them from losing any magic - the "re 'eval'" option is probably specified in the Regexp::Common module, the regular expressions returned from there keep that, but when you stringify and re-compile them in a new scope, you need the re 'eval' option again. This way, since I'm not doing that, I don't allow anything else (that may be improperly untainted) to use the dangerous eval option while still getting the power of Abigail's Regexp::Common module (with evals).

    Also, you didn't specify {-keep} anywhere, so there aren't any capturing parenthesis for $1 to show. So I added some. Not sure that's what you want to capture...

      "This way, since I'm not doing that, I don't allow anything else (that may be improperly untainted) to use the dangerous eval option while still getting the power of Abigail's Regexp::Common module (with evals)."

      I think you're being much too polite, and probably unfairly blaming this insanity on Abigail, rather than the original author, Damien Conway.

      In principle, the Regexp::Common module could be the simplest thing out on CPAN: a library of regexps that you request by name. Instead it has this crazy interface that looks like hashes of hashes but isn't (the order of the keys doesn't matter), and there's something strange about what it returns that I couldn't be bothered to figure out myself. When last I looked if you tried to peek at it with the "x" command in the debugger, the debugger would crash.

      One of my rules of thumb is that a module that's too complicated to work with the debugger is too complicated to use in production. So to answer the question posed in the title: no, I don't think Regexp::Common is all that common. Programmers have quietly voted with their feet and walked away from using it.

        Instead it has this crazy interface that looks like hashes of hashes but isn't (the order of the keys doesn't matter)...

        The order of the keys doesn't matter in most hashes. Regexp::Common uses a tied hash to avoid compiling all of the possible regexps at compile time.

        ... and there's something strange about what it returns that I couldn't be bothered to figure out myself.

        A compiled regular expression? They've been around for most of a decade, if not longer.

        When last I looked if you tried to peek at it with the "x" command in the debugger, the debugger would crash.

        Having read some of the debugger's code, I'm not surprised. Did you file a bug?

        I think you're being much too polite, and probably unfairly blaming this insanity on Abigail, rather than the original author, Damien Conway.

        I think you're misreading me. I prefer to have all my ugly hacks hidden behind nice, neat APIs. Regexp::Common provides a nice, neat API (though how "nice" or "neat" could be debated, but it's still an API). In this case, where we're using re 'eval', it also nicely partitions my tainted code away from evals. That is, I can use those "common" regular expressions (with all of their re-eval trickery), without exposing any of the rest of my code to possible injection attacks. This doesn't absolve me from proper untainting of my input, of course, it merely lowers the risk without reducing the power.

        In principle, the Regexp::Common module could be the simplest thing out on CPAN: a library of regexps that you request by name.

        Yeah, but that it loses a lot of its power, doesn't? Currently, you can use the "balanced" regex with almost any delimiters you want. If all you could do was to request them by name, you'd need a thousand names to get balanced patterns with a thousand different delimiters, and if you would have the thousand-and-one delimiter, you're out of luck. What you want isn't much different from wanting subroutines that do not take arguments.

        Instead it has this crazy interface that looks like hashes of hashes but isn't (the order of the keys doesn't matter),

        The order partially matters. It matters for the part that defines the name, but it doesn't matter for the configuration. That's not uncommon for other APIs, where the order of the options doesn't matter, but it does matter for mandatory arguments.

        and there's something strange about what it returns that I couldn't be bothered to figure out myself.

        It returns an overloaded object. Which stringifies to a pattern.

        As for the use re 'eval', there's no way around it if you want to stick to pre 5.10. To do recursion in 5.8.x (or earlier), you need the (??{ }) construct. Which you can use without problems if it appears literally. But will trigger an exception if you interpolate it (the reason being that up to the point (?{ }) and (??{ }) where introduced, interpolating variables in a regex was "safe", it couldn't run Perl code. With the new constructs what was no longer true, so to protect older code, you had to use use re 'eval' if your interpolated variables contain such constructs).

        Now, if you don't trust the patterns from Regexp::Common, you shouldn't run them at all, because they will contain (??{ }) and (?{ }) constructs, and will execute Perl code when evaluating a pattern. Regardless whether you set use re 'eval' or not. You need use re 'eval' if you're going to interpolate pattern in a larger regexp, because Perl will first stringify the pattern (except in some trivial cases), and then, if they contain (??{ }) or (?{ }), you need the use re 'eval' or trip the safety mechanism.

Re: Regexp::Common not so common?
by FunkyMonk (Chancellor) on Aug 14, 2008 at 13:40 UTC
    I don't believe you can join two regex's together using string concatenation. If you spell it out, it works:
    $teststring =~ s/hello\s*$RE{balanced}{-parens=>'{ }'}\s*$RE{balanced} +{-parens=>'{ }'}/hi/; #teststring: start hi end

    (Except you didn't have any capture parentheses in your original s///.)


    Unless I state otherwise, all my code runs with strict and warnings
      Yes, you can join two regexes together using ".". What you have to do is obey the error message and add
      use re 'eval';
      to your script. See re.
      []s, HTH, Massa (κς,πμ,πλ)
Re: Regexp::Common not so common?
by eosbuddy (Scribe) on Aug 14, 2008 at 14:38 UTC
    You can't concatenate regexp the way you did, but you can look for repetitions. Since you're looking for matching parenthesis, you need non-greedy quantifiers. I would use something of this sort (unless you have a specific reason to concatenate them as you did):
    #!/usr/bin/perl use strict; use warnings; my $test = "I am here {there} and {everywhere} but not below and above +"; print "$test\n"; $_ = $test; if (/\{(.*?)\}(.*)\{(.*?)\}/) { print "Success $1 $3\n"; }

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://704345]
Approved by toolic
Front-paged by Tanktalus
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others goofing around in the Monastery: (7)
As of 2024-04-19 13:05 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found