in reply to Re: Re: Parsing out URLs with regex
in thread Parsing out URLs with regex

Actually, this is a good example of when .*? is not the best choice. [^"]* is a much better idea. You don't want to run into this problem:

$page= '<a href="foo">...' . '<a href="bar" title="baz"><b>Click Here'; $page =~ /<a href="(.*?)" title="(.*?)"><b>Click Here/i;
where $1 will contain 'foo">...<a href="bar'.

                - tye

Replies are listed 'Best First'.
Re: Re^3: Parsing out URLs with regex (diedotstar)
by halley (Prior) on May 14, 2003 at 19:51 UTC

    Whups, didn't see the mandatory title="" in the match. Jumped the gun.

    --
    [ e d @ h a l l e y . c c ]

      Note that such isn't really the problem. Putting nearly anything after or before the .*? in a regex can cause you problems. Even just

      /<a href="(.*?)">/i
      will match way too much by matching way too early against
      '<a href="oops" lots of stuff <a href="ok">' '<a href="oops" > d'oh! whitespace! <a href="ok">' '<a href="oops, break a browser? <a href="ok">'
      (:

                      - tye