Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi, my prob is this:

I have some data like this: {[[group1 #xxx#]] text [[group2]] text2 [[group3 #something# else]]}
My task is simple: I wanna delete all groups "[...]" containing the chars "#...#". Text between groups must be left! After that there must be the "winning" group left, means the first from left "{" or nothing.

In this example it should be: ->" text group2 text2"
I tried this before: $R  =~ s/(.*?)(\[.*?\#.*?\#.*?\].*?)?(.*?)/$1$3/gm;
but it isnt satisfying.

Perhaps there is a PerlRegGuru out there? :-) thx

Alex

Code tags and general spification - dvergin 2002-11-19

Replies are listed 'Best First'.
Re: Need a cool RegExpresseion. plz help
by sauoq (Abbot) on Nov 20, 2002 at 00:09 UTC
    my $winner = $1 if $R =~ /(\[[^#]*(?:\]|#[^#]*\]))/;

    As an explanation, that matches a literal left square bracket followed by anything that is not an octothorp, followed by either a right bracket or anything that is not an octothorpe then a right bracket. If you can have two "winning" groups on one line, you should use minimal rather than greedy matching.

    Another way is:

    my $winner = $1 if $R =~ /(\[[^#]*(?:#[^#]*)?\])/;

    Which means a literal left bracket followed by anything that is not an octothorpe followed by an optional octothorp and anything that is not an octothorp followed by a right bracket. I think I prefer this method. Here is a nicely commented version:

    /( # Start capturing. \[ # A literal left bracket. [^#]* # Anthing that is not an octothorp (?: # Group without capturing \# # A literal octothorp [^#]* # Anything that is not an octothorp )? # End group. Group is optional. \] # literal right bracket. )/x

    -sauoq
    "My two cents aren't worth a dime.";
    
Re: Need a cool RegExpresseion. plz help
by nothingmuch (Priest) on Nov 19, 2002 at 23:55 UTC
    I think that some work is being done without it needing to be done...
    $R =~ s/\[[^\]]*?#[^\]]*?#.*?\]//gm;
    Will simply remove any blocks of text enclosed in brackets, which have at least two sharp/pound signs chars in them.

    Your solution is bit complicated. For example the group finding part is quantified with ?, which means match once or zero times... Also the trapping of the first group and the third one, and then rewriting them in place is a bit expensive.
    The regexp seems correct other than the redundancy, as nothing is forced.

    Update: Corrections based on sauoq's reply. The last one doesn't need to be negated tho, as nongreedy will match the first occurance of ]. Someone slap me next time i answer stuff at 2 am...

    -nuffin
    zz zZ Z Z #!perl

      Two nits. You can write your character classes as [^]] rather than [^\]]. I think it's easier to read that way but someone else might not. More importantly, your /m modifier to the regex isn't doing a bit of good. That only affects the way anchors (^ and $) work. Perhaps you meant to use /s so that your final .* would match a newline.

      Other than that, your reply is better than mine as it actually does what he asked for rather than matching the "winning" group, which is what I did...

      Update: Just to be clear, I'd write it like s/\[[^]]*?#[^]]*?#[^]]*?\]//g. I prefer to be explicit about matching anything other than a right bracket rather than using constructs like .*?\] which work but don't really say what they mean. Avoiding them also allows you to avoid the /s modifier.

      -sauoq
      "My two cents aren't worth a dime.";