ultranerds has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I'm trying to match the content between 2 tags, ref and /ref

Heres what I have so far:

    $message =~ s{\[ref\](.*)\[/ref\]}{add_sup_note2($1)}ge;

Here is some example content of $message:

<html><body bgcolor="#DDE8F4"><h3>Getting There</h3> <br> <br>testing <sup>[<a href="#ref-1">1</a>]</sup>&nbsp;&nbsp; <b>hhh</b><br></body></html>

If I do this with the content split up into multiple lines, it works fine - for example:

my $test = qq|==Getting There== testing [ref]bla bla bla [[http://www.google.com]] <b> test </b> and [ +[http://www.google.com\|testing]] [/ref] hhh testing [ref]bla bla bla an[/ref] hhh |; while ($test =~ m%\[ref\](.*)\[/ref\]%gix) { print "FOO, $1 \n"; }


..but due to the fact ALL the content is on one line - my regex doesn't seem to work :/#

Any suggestions?

TIA

Andy

Replies are listed 'Best First'.
Re: Regex to match between [ref] and [/ref]
by Anonymous Monk on Jul 17, 2009 at 10:27 UTC
      Hi,

      Thanks for the reply. I just tried a basic test with Parse::BBcode, using their example:

      #!/usr/bin/perl print qq|Content-Type: text/html \n\n|; use Parse::BBCode; # \[\[([^]]+)\|([^]]+)\]\] my $test = qq|==Getting There== testing [ref]bla bla bla [[http://www.google.com]] <b> test </b> and [ +[http://www.google.com\|testing]] [/ref] hhh testing [ref]bla bla bla an[/ref] hhh |; my $p = Parse::BBCode->new({ tags => { # load the default tags Parse::BBCode::HTML->defaults, # add/override tags url => 'url:<a href="%{link}A">%{parse}s</a>', i => '<i>%{parse}s</i>', b => '<b>%{parse}s</b>', noparse => '<pre>%{html}s</pre>', code => sub { my ($parser, $attr, $content, $attribute_fallback) + = @_; if ($attr eq 'perl') { # use some syntax highlighter $content = highlight_perl($content); } else { $content = Parse::BBCode::escape_html($$conten +t); } "<tt>$content</tt>" }, test => 'this is klingon: %{klingon}s', }, escapes => { klingon => sub { my ($parser, $tag, $text) = @_; return translate_into_klingon($text); }, }, } ); my $code = $test; my $parsed = $p->render($code); use Data::Dumper; print Dumper($parsed);


      ..but I get an error:

      Not a HASH reference at /usr/lib/perl5/vendor_perl/5.8.4/Parse/BBCode.pm line 81.

      Any suggestions? I've never had much luck with those BBCode perl modules (thats why I tend to try and just do it with regex rules =))

      TIA

      Andy
        There is an error in the documentation (I thought I had already fixed that): any declaration that has a subref in it must be written like this:
        tagname => { code => sub { ... }, },
        I forgot this because the tagname is 'code', and my brain probably refused to duplicate a word. Please report, if you find any further bugs or documentation issues.

        edit: so that means it must be:

        my $p = Parse::BBCode->new({ tags => { ... code => { code => sub { my ($parser, $attr, $content, $attribute_fallback) = @ +_; }, }, ...
Re: Regex to match between [ref] and [/ref]
by Anonymous Monk on Jul 17, 2009 at 10:34 UTC
    .* is greedy, you should use .*? also, you should use s///s so . matches newlines

    Death to Dot Star!

      Thanks - this seems to work perfectly :)

      $message =~ s{\[ref\](.*?)\[/ref\]}{add_sup_note2($1)}ge;

      Cheers

      Andy
      Hi,

      Another question :)

      I'm now trying:

          $message =~ s{\[(ref|note|réf)\](.*?)\[/(ref|note|réf)\]}{add_sup_note2($2)}ge;

      ...so it will work with:

      [ref]something[/ref] [note]something[/note] [réf]something[/réf]


      ..but this one refuses to work :/

      [réf]something[/réf]

      Any ideas?

      TIA

      Andy
        Duh, nevermind - got it working. When I was testing, I had

        [réf]something[réf]

        ..instead of:

        [réf]something[/réf]

        Doh!
Re: Regex to match between [ref] and [/ref]
by JavaFan (Canon) on Jul 17, 2009 at 10:28 UTC
    Untested, so may contain typos:
    m{\Q[ref]\E(?:(?!\Q[/ref]\E).)*\Q[/ref]\E} # Simple m{\Q[ref]\E[^[]*(?:\[(?!\Q/ref]\E)[^[]*)*\Q[/ref]\E} # Some unrolling
      Hi,
      Thanks - tried those 2, but both give no content (but they do seem to produce 2 matches - just without any values =))
      while ($test =~ m%\Q[ref]\E(?:(?!\Q[/ref]\E).)*\Q[/ref]\E%gix) { print "FOO, $1, $2 and $3 \n"; }

      ..and:
      while ($test =~ m%\Q[ref]\E[^[]*(?:\[(?!\Q/ref]\E)[^[]*)*\Q[/ref]\E +%gix) { print "FOO, $1, $2 and $3 \n"; }
      TIA

      Andy
        Not very surprising, is it? Regexes that do not contain capturing parenthesis don't set $1.