ultranerds has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I'm trying to get the following code to work. The [tag] stuff works fine.. but trying to grab the content AFTER it doesn't seem to

my $current_article = qq| [sommaire foo bar 3] [[345fsdf sf]] [[sdf sf fsd fsf]] [[foo bar\|whatever]] == foo bar === foo bar test [/sommaire] |; while ($current_article =~ m/\[\[(.+?)\]\]/g) { my $tag_name = $1; print "Got $tag_name \n"; print "Checking if we have any contents AFTER it... \n"; print "RUNNING: \Q[[$tag_name]]\E(.*?)\[ \n"; if ($current_article =~ /\Q[[$tag_name]]\E(.*?)\[/) { print "Got contents: $1 and $2 \n"; } print "\n\n"; }
Basically, the output I'm after is: Tag: 345fsdf sf Tag: sdf sf fsd fsf Tag: foo bar|whatever Tag Contents: == foo bar === foo bar test

I've tried a ton of options, but can't seem to get it quite right.

Can anyone suggest whats going on? =)

TIA

Andy

Replies are listed 'Best First'.
Re: Can't quite get this regex working
by GrandFather (Saint) on Apr 08, 2011 at 11:18 UTC

    By default a . in a regex doesn't match \n so your "contents" match doesn't get what you expect. Add a s flag to the regex to give /\Q[[$tag_name]]\E(.*?)\[/s and some of what you want will work better. If you add strictures (use strict; use warnings;) you'll find out that there is more work to do however.

    True laziness is hard work
Re: Can't quite get this regex working
by bart (Canon) on Apr 08, 2011 at 12:07 UTC
    The main cause is the missing /s modifier, as indicated. Here is my alternative edit, which doesn't repeat matching the first tag (which, BTW, would fail if you have the same tag twice in the string) but instead continues where it was:
    while ($current_article =~ m/\[\[(.+?)\]\]/g) { my $tag_name = $1; print "Got $tag_name \n"; print "Checking if we have any contents AFTER it... \n"; print "RUNNING: \Q[[$tag_name]]\E(.*?)\[ \n"; if ($current_article =~ /\G(.*?)(?=\[)/gcs) { print "Got contents: $1\n"; } print "\n\n"; }
    On your code, this produces:
    Got 345fsdf sf Checking if we have any contents AFTER it... RUNNING: \[\[345fsdf\ sf\]\](.*?)[ Got contents: Got sdf sf fsd fsf Checking if we have any contents AFTER it... RUNNING: \[\[sdf\ sf\ fsd\ fsf\]\](.*?)[ Got contents: Got foo bar|whatever Checking if we have any contents AFTER it... RUNNING: \[\[foo\ bar\|whatever\]\](.*?)[ Got contents: == foo bar === foo bar test
Re: Can't quite get this regex working
by jethro (Monsignor) on Apr 08, 2011 at 11:18 UTC

    You could either use  /\Q[[$tag_name]]\E((.|\n)*?)\[/) or add a modifier to the regex so that '.' matches '\n' as well, I think it was m or s

      Generally where you are matching up to some expected character a negated character set is better than .*?. In this case ([^\[]*) would be appropriate.

      True laziness is hard work
        Careful, when the delimiters are doubled, the file format might allow a single ] or [ on the inside, in which case the negated char class doesn't work.

        In that case you can use

        (?s:(?!\]\]).)*
        Thanks, very good point - I'll have a play with that :)

        Cheers

        Andy
      You legend - works like a charm!

      Thanks!

      Andy