sliles has asked for the wisdom of the Perl Monks concerning the following question:

I am doing patern matching by extracting lines using a starting pattern and an ending pattern, e.g.
if (/BEGIN PATTERN/ ... /END PATTERN/)
Why doesn't the following code snippet work? What I want to do is this: Anytime there is an opening and closing curly brace after an "if" followed by an "else", delete both of these curly braces:
Code Snippet: # Beginning pattern: 'if' followed by 0 or more characters #followed by the nearest { brace # # Ending pattern: 0 or more chars. followed by nearest # } brace followed by an 'else' if ( /if.*?{/ .. /.*?}.*?else/ ) { $_ =~ s/{//; $_ =~ s/}//; } ----------------------------------------------------- Here is the text I am reading: if (c=e) { // delete this curly brace call pgme; call pgmd; } // delete this curly brace else {call pgmd; // keep these curly braces call pgmc;} I want the text to be changed to the following: if (c=e) call pgme; call pgmd; else {call pgmd; call pgmc;}
---------------------- Apparently it is not meeting the condition of the "if' statement, because nothing is changed in my text. This seems like it should work. Any ideas? Thanks for all the suggestions... you folks are great! Susan

Replies are listed 'Best First'.
Re: pattern matching
by dmmiller2k (Chaplain) on Jan 10, 2002 at 02:47 UTC

    One reason that jumps out at me is that the pattern character, '.' won't normally match a newline.

    If you append the '/s' modifier to your regexes it changes the behavior such that '.' will match newlines (see perlman:perlre). Try this:

    if ( /if.*?\{/s .. /.*?\}.*?else/s ) { s/\{//; s/\}//; }

    This, of course, assumes that you have the entire if-else construct in $_ to begin with.

    Update: fiddled with text.

    Update 2: removed extraneous $_ bindings in substitutions and escaped the curly braces (nice catch, Zaxo).

    dmm

    You can give a man a fish and feed him for a day ...
    Or, you can
    teach him to fish and feed him for a lifetime
Re: pattern matching
by mkmcconn (Chaplain) on Jan 10, 2002 at 03:03 UTC
    dmmiller2k is right.
    You also want to be sure that the 'text' that you are reading is examined as a single string, by your matching pattern (not one line at a time).
    Then the regex is pretty simple (but note the /s operator is added, as dmmiller2k advised).
    #!/usr/bin/perl -w use strict; #examine DATA as a single string. $/ = ''; while (<DATA>){ #'.' will match \n with the /s operator if ( m/if.+else/s ){ s/{//; s/}//; } print; } __END__ if (c=e) { #// delete this curly brace call pgme; call pgmd; } else { call pgmd; #// keep these curly braces call pgmc; } if (c=e) { #// delete this curly brace call pgme; call pgmd; } else { call pgmd; #// keep these curly braces call pgmc; }

    mkmcconn
    update please note that this regex example is
    for illustration only - it's easily broken and the reason
    it appears to work is because of the structure of the data
    example.
Re: pattern matching
by Zaxo (Archbishop) on Jan 10, 2002 at 03:01 UTC

    I think the immediate problem is that curlies are re metas. Escape them as:

    if ( /if.*?\{/s .. /.*?\}.*?else/s ) { $_ =~ s/\{//; $_ =~ s/\}//; }
    That's not the only problem, though. Consider the result of applying that to:
    if (c=e) { call pgme; call pgmd; } if (c=d) { call pgmd; call pgme; } else { call pgmd; call pgmc; }
    You may get some use of Text::Balanced.

    Update: Added s switch to matching, ++dmmiller2k.

    After Compline,
    Zaxo

Re: pattern matching
by boo_radley (Parson) on Jan 10, 2002 at 04:02 UTC
    Complicated regexes are for the weak; Herr Doktor Conway's code is for the strong. I have to say that Text::Balanced worked a bit differently than I thought it would; I plan on fiddling with it a bit and reporting back on it.
    This code is not particularly beautiful, but it does show a use of Text::Balanced.
    use Text::Balanced "extract_delimited"; my $out; my $r = ' if (c=e) { // delete this curly brace call pgme; call pgmd; } // delete this curly brace else {call pgmd; // keep these curly braces call pgmc;} if (c=e) { // delete this curly brace call pgme; call pgmd; } // delete this curly brace else {call pgmd; // keep these curly braces call pgmc;} if (c=e) { // delete this curly brace call pgme; call pgmd; } // delete this curly brace else {call pgmd; // keep these curly braces call pgmc;} if (c=e) { // delete this curly brace call pgme; call pgmd; } // delete this curly brace else {call pgmd; // keep these curly braces call pgmc;} '; while (($e, $r, $s) = extract_delimited ($r, "{\"}", '(?s)[^{]*')){ do {$out .= $r; last } unless $e ; if ($s =~/if /) { $e =~s/[{]//; # make sure we don't nuke something in a quoted block. $e = reverse $e; $e =~s/[}]//; $e = reverse $e; }; $out .= $s . $e; } print $out;
Re: pattern matching
by Rich36 (Chaplain) on Jan 10, 2002 at 03:26 UTC
    I was able to come up with a working version with a much simpler regex. The regex matches just the if statement, then removes the next set of curly braces. One caveat though - it only removes braces for the first if statement. Subsequent ones would not be effected.
    One thing you might keep in mind is that it's often a good idea to backslash '\' characters like braces if your using them in a non-grouping way in your regex. I tend to backslash all non-alphanumeric chars in my regexes, just to be safe.
    #/usr/bin/perl -w use strict; my $data; while(<DATA>) {  last if ($_ eq "");  $data .= $_; } print $data; if ( $data =~ /if.*?\{/s ) { # New RegEx  $data =~ s/\{//;  $data =~ s/\}//; } print $data; __DATA__ if (c=e) {   //delete these curly braces  call pgme; call pgmd; } //delete these curly braces else {call pgmd; // keep these curly braces call pgmc;}

    Output: if (c=e) //delete these curly braces call pgme; call pgmd; //delete these curly braces else {call pgmd; // keep these curly braces call pgmc;}

    Rich36
    There's more than one way to screw it up...

Re: pattern matching
by particle (Vicar) on Jan 10, 2002 at 03:52 UTC
    be really careful here in making sure your data sample is an accurate worst-case example. for instance, if there are nested if structures in your real data, how do you handle that?

    i haven't checked any code, but it seems possible these methods may fail in that case, and that you'd have to build a more robust algorithm. i can't check your data, but i've been bitten by something like this in the past. my own experience with this compels me to tell you: be careful.

    TEST, TEST, TEST!!!

    ~Particle