metachar match

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I have an issue I'm having difficulty fixing. For an experienced Perl programmer this is probably fairly simple.
I just want to strip the lines that begin opt = and set = from the input. The code below is an example (i.e. word just indicates that there is a proper word in there)
Here's my code

use Data::Dumper; $Data::Dumper::Terse = $Data::Dumper::Indent = 1;

my $test = <<EOT;
  opt = {word|word|word|word|word|word|word|word|word|
            word|word|word|word|word|word|
            word|word|word|word|word|word|word|word|word}
  logn = {alpha|<beta>|<gamma>}
  objn = {blah|blah|blah|blah|blah|blah|<blah>}
    set = [ one | two | three ]
EOT

my $delim = qr{\s*\|\s*};
my $box = qr/\w+\s*=\s*\[\s*([^]]+)\s*\]/;
my $brace = qr/\w+\s*=\s*{\s*([^}]+)\s*}/;
my $re = qr/^\s*$brace\s*$brace\s*$brace\s*$box\s*$/;

if ($test =~ $re) {
    my ($opt, $set) = ($1, $4);
    my @opt = split $delim, $opt;
    my @set = split $delim, $set;

    print Dumper({
        opt     => [ @opt ],
        set    => [ @set ]
    }), $/;
}
[download]

This works fine. The problem is that the following changes have been introduced and I'm not sure how to deal with them.

my $test = <<EOT;
This is a new line [,set ]
  opt = {word|word|word|word|word|word|word|word|word|
            word|word|word|word|word|word|
            word|word|word|word|word|word|word|word|word}
  logn = {alpha|<beta>|<gamma>}
  objn = {blah|blah{blah|blah|blah}blah|<blah>}
    set = [ one | two | three ]
(this is new also)
EOT
[download]

Three changes have been introduced. A start line a finish line and the objn line now has braces in it as well. All I want is the contents of the lines beginning opt = and set = as before (as the example demonstrates).
How do I achieve this ?

Comment on metachar match Select or Download Code

Replies are listed 'Best First'.
Re: metachar match by jonadab (Parson) on Mar 23, 2006 at 11:12 UTC
All I want is the contents of the lines beginning opt = and set = If this is really all you want, then why does the regex detail all that other stuff? If you don't need to validate that the rest of the data adhere to the format, I would simplify the code to just get what you want, ignoring aught else... `my ($opt) = $test =~ $brace; my ($set) = $test =~ $box; my @opt = split $delim, $opt; my @set = split $delim, $set;` [download] Sanity? Oh, yeah, I've got all kinds of sanity. In fact, I've developed whole new kinds of sanity. Why, I've got so much sanity it's driving me crazy.	[reply] [d/l]
Re^2: metachar match by Anonymous Monk on Mar 23, 2006 at 11:25 UTC
Ok point taken. However does this not mean that the input contents ($test) are scanned twice ?. In my example it's a small input but in reality it may be considerably larger (granted I should have pointed that out).	[reply]
Re: metachar match by jonadab (Parson) on Mar 25, 2006 at 11:34 UTC
However does this not mean that the input contents ($test) are scanned twice Yes, but that doesn't necessarily mean it will be a performance problem. Indeed, it doesn't necessarily mean it will be slower than the other way. Regular expression performance metrics are complicated by backtracking, so if you wanted to know how the performance of one way compares with the performance of another way, you'd have to actually test them both. Personally, I've always felt that if you have to do benchmarks to figure out which way is faster, then it doesn't actually matter, because both are fast enough. On the other hand, if you test one way of doing something and have noticeable performance problems, then it's worth looking for a faster way. In the absense of noticeable performance problems, though, any optimization you do is premature. In this example, although the entire string is pattern-matched twice, each of the two pattern matches starts with a static substring (opt and set), which will not match at most points in the string. This limits the amount of tracking forward and back through the string that the matching engine will have to do for these matches. I would recommend starting with the way that makes the code simple and easy to maintain, and then only if there are performance problems, look for ways to improve performance. Premature optimization is a root of all kinds of evil for which some have strayed from best practices and pierced themselves through with many sorrows. (Apologies to MJD and Paul.) Sanity? Oh, yeah, I've got all kinds of sanity. In fact, I've developed whole new kinds of sanity. Why, I've got so much sanity it's driving me crazy.	[reply]
Re: metachar match by borisz (Canon) on Mar 23, 2006 at 11:14 UTC
Here is a example: `$_ = "{a{12}s{ss}}"; my $myre; $myre = qr/ { (?: (?>[^{}]+) \| (??{$myre}) )* }/x; my ( $match ) = /($myre)/; print $match;` [download] Boris	[reply] [d/l]
Re: metachar match by ysth (Canon) on Mar 23, 2006 at 10:46 UTC
the objn line now has braces in it See perlre on `(??{ code })` for an example of matching stuff in (), including any nested () pairs. You can easily adapt it to matching curly braces.	[reply] [d/l]