dsb has asked for the wisdom of the Perl Monks concerning the following question:

This is the situation. I am writing a text based web-browser(don't ask why...call it practice...I want to get to know the Perl extension modules better...This uses LWP, HTTP, and HTML modules) and the way I deal with links calls for a substition regex. I'm using Oreilly's homepage for testing. In dealing with the links I do a substitution on the resource:
while ( $src =~ m%<a href="([^"]+)"(?:.|\n)*?>([^>]+)</a>%g ) { $inf->{$i} = { 'url' => $1, 'res' => $2 }; $s_text = $2; $n_src =~ s%$s_text%[$i $s_text]%; $i++; }
This is just a possible method to keeping track of the links. The problem is that one of the links in the source for the page has a '+' in it. I get a "nested *?+" error when I run the script because of it. I would just escape it in the regex but the value is held in a scalar. I tried to substitute all +'s with an escaped plus but that didn't work. It just substitutes with another '+'(the backslash apparently doesn't show in the string).

I'm out of ideas with this one. Any help would be great.

Amel - f.k.a. - kel

2001-03-04 Edit by Corion : Changed title

  • Comment on Regex to match 'A HREF=', quoting RE replacements (was: Regex Question)
  • Download Code

Replies are listed 'Best First'.
Re: Regex Question
by japhy (Canon) on Mar 03, 2001 at 00:57 UTC
Re: Regex Question
by Anonymous Monk on Mar 03, 2001 at 01:09 UTC
    I tried to substitute all +'s with an escaped plus but that didn't work. It just substitutes with another '+'(the backslash apparently doesn't show in the string).

    If this is the case, your substitution code probably looked something like s/\+/\+/g which is incorrect, you're escaping the + on the rhs not adding a backslash, you would need to do s/\+/\\+/g, but don't do that either, if you want to escape metacharacters use quotemeta() or \Q