in reply to Parsing out URLs with regex

pickup the wheel:
Regexp::Common::URI

aritst

Replies are listed 'Best First'.
Re: Re: Parsing out URLs with regex
by halley (Prior) on May 14, 2003 at 19:41 UTC

    Agreed. When working on code for heavy use, don't reinvent the wheel.

    For learning purposes, though, you don't want (\w*) for the maximum number of consecutive word characters, you want (.*?) for the minimum number of characters followed by the closing quote.

    --
    [ e d @ h a l l e y . c c ]

      Actually, this is a good example of when .*? is not the best choice. [^"]* is a much better idea. You don't want to run into this problem:

      $page= '<a href="foo">...' . '<a href="bar" title="baz"><b>Click Here'; $page =~ /<a href="(.*?)" title="(.*?)"><b>Click Here/i;
      where $1 will contain 'foo">...<a href="bar'.

                      - tye

        Whups, didn't see the mandatory title="" in the match. Jumped the gun.

        --
        [ e d @ h a l l e y . c c ]