Re: Changing from relative to absolute path

Update: I haven't edited the post below at all, just an update to say I appreciate this is a poor solution and suggest that it isn't used by the poster. My suggestion is way too simplistic for 99% of cases and I'd suggest you read the more learned reply from Davorg above mine :)

If I were you I'd just change your substitution to the generic substitution I've shown here rather than a distinct one for each link...

s#<A HREF="#<A HREF="http://www.mysite.com/#g;<br><br>
[download]

Depending on the size of the html file you are producing you can either go over it line by line as you are or just read the whole file into one variable and do one global substitution.

HTH,
Neil

Comment on Re: Changing from relative to absolute path Download Code

Replies are listed 'Best First'.
Re: Re: Changing from relative to absolute path by davorg (Chancellor) on Sep 05, 2002 at 12:52 UTC
What happens if the "href" attribute isn't the first attribute of the "a" tag? Parsing HTML with regexes is a bad idea, use the right tools for the job (in this case probably HTML::LinkExtor and URI). -- <http://www.dave.org.uk> "The first rule of Perl club is you do not talk about Perl club." -- Chip Salzenberg	[reply]
Re^3: Changing from relative to absolute path by Nemp (Pilgrim) on Sep 05, 2002 at 13:06 UTC
I was trying a simple solution for the problem and didn't think of that... you are right of course and I apologize to the questioner if he tried my suggestion and it failed. As a question for my personal interest, and not a suggestion for the poster to implement, would something like this work... `s#(<A.?HREF=")#$1absolutedirectory/#;` [download] thanks, Neil Update:* I should probably add that I am in no way advocating not using the right tools for the job - just interested if I have interpreted the problem correctly now.	[reply] [d/l]
Re: Changing from relative to absolute path by Abigail-II (Bishop) on Sep 05, 2002 at 13:33 UTC
No. A few cases where it will not work (assuming `.?` was just a typo for `.?`): Lower case letters for name of tag or attribute. Newline between the tag name and the attribute. Whitespace around the `=`. Attribute value not quoted, or single quote quoted. `<A NAME = "HREF=">`. `<A NAME = "FOO">Blah</A> <A HREF = "fnord.html"> ...`. `<IMG SRC = "math.gif" ALT = "B<A"> ... HREF="foo"`. Since perl 5.6.0, you can parse HTML with a regex. But it's not going to be a simple one, and you are better off using a real parser. Abigail	[reply] [d/l] [select]
Re: Re^3: Changing from relative to absolute path by davorg (Chancellor) on Sep 05, 2002 at 13:13 UTC
You still have problems if the "a" and the "href" aren't on the same line of the file. This is why regexes are such a bad idea for this job - you need to think of so many corner conditions. Just when you fix one, the next one appears. -- <http://www.dave.org.uk> "The first rule of Perl club is you do not talk about Perl club." -- Chip Salzenberg	[reply]
Re: Re: Re^3: Changing from relative to absolute path by Nemp (Pilgrim) on Sep 05, 2002 at 13:16 UTC
Re: Changing from relative to absolute path by Anonymous Monk on Sep 05, 2002 at 13:20 UTC
s#(<A.?HREF=")#$1absolutedirectory/#; You probably want `.?` there, as well as `i` (HTML is not case sensitive) and a `g` (since people seem to be attached to writing really long lines). perlre(1) is your friend.	[reply]