ultranerds has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I'm trying to write a bit of regex, which will got through a list of tags like [something] , and convert to their correct article_id (kinda like a wiki) The code I have is:
# for tags like [[something]] while ($post_message =~ m%\[\[([^]]+)\]\]%gix) { if ($1 =~ /\|/) { next; } my $tmp = $1; my $val = $1; # print STDERR "\n\nFOO\n\n"; if ($DB->table('Articles')->count( { article_title => $tmp } ) > 0 +) { print TESTOUT "\n\nFOO 2 - gggg - $tmp \n\n"; my $article_id = $DB->table('Articles')->select( ['article_id' +], { article_title => $tmp } )->fetchrow; $tmp =~ tr{&[]ÀÂÄàâäÇçÉÊÈËéêèëÏÌÎïìîÖÔÒöôòÜÛÙüûùA-Z?!;«»()" }{ + aaaaaacceeeeeeeeiiiiiioooooouuuuuua-z _}; $tmp =~ s/ /_/g; $post_message =~ s/\Q[[$1]]/[[$article_id]]/sig; } else { ` # do something here latest } }
..but it keeps on looping over and over the same stuff (untl the script dies, or I kill the process in SSH) An example post is:
==Introduction== A Montréal, il y a [[vieux Montréal]]. [[Montréal]] [[test]] ==Brief History== asdfasfd as df asf d asdf as fd asf dasfd ==Geography== ==Regions== ==Cities== ==Sights and Activites== ==Weather== ==Getting There== ==END Getting There==
Can anyone suggest why it keeps doing this kinda stuff: FOO 2 - gggg - vieux Montréal FOO 2 - gggg - Montréal FOO 2 - gggg - vieux Montréal FOO 2 - gggg - 45 FOO 2 - gggg - vieux Montréal FOO 2 - gggg - 47 FOO 2 - gggg - vieux Montréal FOO 2 - gggg - 48 FOO 2 - gggg - vieux Montréal TIA! Andy

Replies are listed 'Best First'.
Re: Regex keeps looping on itself :/
by moritz (Cardinal) on Jun 17, 2009 at 15:44 UTC
    $str =~ m/.../g in scalar context keeps track of its current position via pos($str), and $str =~ s/.../.../g resets pos($str).

    You could something along these lines instead:

    $post_message =~ s{\[\[([^]]+)\]\]}{process_match($1)}ge; sub process_match { my $tmp = shift; if ($DB->table('Articles')->count( { article_title => $tmp } ) > 0) + { # return modified text here } else { return $tmp; # unmodified text } }
      Hi,

      The solution works perfectly - thank you very much :) Will have to remember that trick for another time.

      Cheers

      Andy
Re: Regex keeps looping on itself :/
by ikegami (Patriarch) on Jun 17, 2009 at 15:35 UTC

    The following expression resets the position at which the next match will start:

    $post_message =~ s/\Q[[$1]]/[[$article_id]]/sig;

    You'd be better off reading from one var and storing the output in another.

    while ($input =~ /\G(.*?)\[\[([^]]+)\]\]/sg) { my $link = $2; $output .= $1; $output .= process($link); }

    Update: Oops, there's a bug in the above. It drops whatever's after the last link. It's actually easier to solve using a different technique.

    $post_message =~ s{\[\[([^]]+)\]\]}{ process($1) }esg;
      Thanks - that never gets any values though :/
      my $post_message = $DB->table('Post')->select( ['post_message'], { p +ost_id => $_[0] } )->fetchrow; print TESTOUT qq|testing the output of post_message - $post_message +\n\n |; $post_message =~ s{\[\[([^]]+)\]\]}{ # process($1) print TESTOUT qq|FOO: "$1"|; }esg
      TIA Andy

        It gets replaced with the result of the expression. In this case, it gets replaced with 1, the result of print. Remove print TESTOUT