in reply to How to find the outermost pair of square brackets with a regex?

I'm confused -- the text within the outermost brackets?

Surely this will do it:

$str = ' blah blah blah blah blah blah blah blah [blah [blah blah] [blah blah blah blah] blah] blah blah'; $str =~ m/\[(.*)\]/s; print $1;

Because the regex finds the leftmost pattern, /s allows dot to match linebreaks and dot-star is greedy, that's all you need. Am I missing something?



($_='kkvvttuu bbooppuuiiffss qqffssmm iibbddllffss')
=~y~b-v~a-z~s; print
  • Comment on Re: How to find the outermost pair of square brackets with a regex?
  • Download Code

Replies are listed 'Best First'.
Re^2: How to find the outermost pair of square brackets with a regex?
by lokiloki (Beadle) on Jan 17, 2007 at 07:15 UTC
    good idea, but this won't work because there may be multiple pairs of outermost (i.e., same level) brackets. so, a greedy regular expression will grab beyond. i should have clarified this. for example:

    $str = ' blah blah blah blah blah blah blah blah [blah [blah blah] [blah blah blah blah] blah] blah blah blah blah blah blah blah blah blah blah [blah [blah blah] [blah blah blah blah] blah] blah blah blah blah blah blah blah blah blah blah [blah [blah blah] [blah blah blah blah] blah] blah blah blah blah blah blah blah blah blah blah [blah [blah blah] [blah blah blah blah] blah] blah blah';

    in other words, there are multiple top-level bracket pairs, which, may or may not, contain additional pairs (etc).

    btw, here is the code snippet that i finally used (which has some nuances that i didn't include in the original question), based on the previous suggestion:

    my $re = qr{\[(?:(?>[^\[\]]+)|(??{$re}))*\]}s; for (;;) { last unless $tempstr =~ s/(\[\w+?\s*=\s*($re|\n|[^\[\] +])+\])/&assign($1)/gies; }

      That code is broken.

      • You can't declare $re on the same line as you're using it. It won't even compile (under strict vars).

      • There's a I reason I used a package variable instead of a lexical. It'll bite you if you interpolate into $re. Best to always use a package variable to avoid the problem entirely.

      • While not a bug per say, the s modifiers on both regeps are useless because you don't use ..

      our $re local $re = qr{\[(?:(?>[^\[\]]+)|(??{$re}))*\]}; for (;;) { last unless $tempstr =~ s/(\[\w+?\s*=\s*($re|\n|[^\[\] +])+\])/assign($1)/gie; }

      It's unfortunate that you modify things without knowing why they were done in the first place.

        thanks...

        i guess part of the problem, and perhaps why i was using the s modifier, was because how can something like the following also match for newline?

        [^\[\]]

        since this matches everything BUT [ or ], shouldn't it also match newline? if not, what modifier can force it to match such?

        i couldnt figure that out, which is why i opted for this alternation:

        ($re|\n|[^\[\]])+

      I should point out that ikegami's suggestion by itself would surely have worked, but I couldn't get it to for the nuances that I needed, and so I added some possibly unnecessary cruft to account for my needs. I.e., I don't really know why |\n|[^\[\]+])+\] was necessary, but after many hours of trial and error, it was that which made everything (appear to) work.