in reply to Changing from relative to absolute path

Update: I haven't edited the post below at all, just an update to say I appreciate this is a poor solution and suggest that it isn't used by the poster. My suggestion is way too simplistic for 99% of cases and I'd suggest you read the more learned reply from Davorg above mine :)

If I were you I'd just change your substitution to the generic substitution I've shown here rather than a distinct one for each link...

s#<A HREF="#<A HREF="http://www.mysite.com/#g;<br><br>
Depending on the size of the html file you are producing you can either go over it line by line as you are or just read the whole file into one variable and do one global substitution.

HTH,
Neil

Replies are listed 'Best First'.
Re: Re: Changing from relative to absolute path
by davorg (Chancellor) on Sep 05, 2002 at 12:52 UTC

    What happens if the "href" attribute isn't the first attribute of the "a" tag?

    Parsing HTML with regexes is a bad idea, use the right tools for the job (in this case probably HTML::LinkExtor and URI).

    --
    <http://www.dave.org.uk>

    "The first rule of Perl club is you do not talk about Perl club."
    -- Chip Salzenberg

      I was trying a simple solution for the problem and didn't think of that... you are right of course and I apologize to the questioner if he tried my suggestion and it failed.

      As a question for my personal interest, and not a suggestion for the poster to implement, would something like this work...

      s#(<A*.?HREF=")#$1absolutedirectory/#;
      thanks,
      Neil

      Update: I should probably add that I am in no way advocating *not* using the right tools for the job - just interested if I have interpreted the problem correctly now.
        No. A few cases where it will not work (assuming *.? was just a typo for .*?):
        • Lower case letters for name of tag or attribute.
        • Newline between the tag name and the attribute.
        • Whitespace around the =.
        • Attribute value not quoted, or single quote quoted.
        • <A NAME = "HREF=">.
        • <A NAME = "FOO">Blah</A> <A HREF = "fnord.html"> ....
        • <IMG SRC = "math.gif" ALT = "B<A"> ... HREF="foo".
        Since perl 5.6.0, you can parse HTML with a regex. But it's not going to be a simple one, and you are better off using a real parser.

        Abigail

        You still have problems if the "a" and the "href" aren't on the same line of the file.

        This is why regexes are such a bad idea for this job - you need to think of so many corner conditions. Just when you fix one, the next one appears.

        --
        <http://www.dave.org.uk>

        "The first rule of Perl club is you do not talk about Perl club."
        -- Chip Salzenberg

        s#(<A*.?HREF=")#$1absolutedirectory/#;

        You probably want .*? there, as well as i (HTML is not case sensitive) and a g (since people seem to be attached to writing really long lines). perlre(1) is your friend.