in reply to Re: Changing from relative to absolute path
in thread Changing from relative to absolute path

What happens if the "href" attribute isn't the first attribute of the "a" tag?

Parsing HTML with regexes is a bad idea, use the right tools for the job (in this case probably HTML::LinkExtor and URI).

--
<http://www.dave.org.uk>

"The first rule of Perl club is you do not talk about Perl club."
-- Chip Salzenberg

  • Comment on Re: Re: Changing from relative to absolute path

Replies are listed 'Best First'.
Re^3: Changing from relative to absolute path
by Nemp (Pilgrim) on Sep 05, 2002 at 13:06 UTC
    I was trying a simple solution for the problem and didn't think of that... you are right of course and I apologize to the questioner if he tried my suggestion and it failed.

    As a question for my personal interest, and not a suggestion for the poster to implement, would something like this work...

    s#(<A*.?HREF=")#$1absolutedirectory/#;
    thanks,
    Neil

    Update: I should probably add that I am in no way advocating *not* using the right tools for the job - just interested if I have interpreted the problem correctly now.
      No. A few cases where it will not work (assuming *.? was just a typo for .*?):
      • Lower case letters for name of tag or attribute.
      • Newline between the tag name and the attribute.
      • Whitespace around the =.
      • Attribute value not quoted, or single quote quoted.
      • <A NAME = "HREF=">.
      • <A NAME = "FOO">Blah</A> <A HREF = "fnord.html"> ....
      • <IMG SRC = "math.gif" ALT = "B<A"> ... HREF="foo".
      Since perl 5.6.0, you can parse HTML with a regex. But it's not going to be a simple one, and you are better off using a real parser.

      Abigail

      You still have problems if the "a" and the "href" aren't on the same line of the file.

      This is why regexes are such a bad idea for this job - you need to think of so many corner conditions. Just when you fix one, the next one appears.

      --
      <http://www.dave.org.uk>

      "The first rule of Perl club is you do not talk about Perl club."
      -- Chip Salzenberg

        Thankyou for your patience and continued input to this thread davorg :) You must be sighing in exasperation and rolling your eyes by now but it really is a lot of help to us newbies. I guess I'll try not to start answering unless I'm closer to 100% sure I'm right!

        Thanks,
        Neil
      s#(<A*.?HREF=")#$1absolutedirectory/#;

      You probably want .*? there, as well as i (HTML is not case sensitive) and a g (since people seem to be attached to writing really long lines). perlre(1) is your friend.