Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Currently I have a script that opens up a web page on my local NT workstation and substitutes relative path with an absolute path on one link. If I want to substitute a relative path on other links I have to write a substitute for each one. Anyway I could make this script substitute all relative paths and put in the absolute path for each one? I basically need to add <A HREF="http://www.mysite.com/ to each relative path. Here is one example:
relative link before my script: directory/file.doc after script: http://www.mysite.com/directory/file.doc
Now here is how I did it but want to know if I could do the same with all my relative links on this one webpage. I assume I need some reg expressions that is more complicated than what I have done?
my $db = 'webpage'l'; open(DATA, "$db") or die "File does not open: $!"; my @data = (<DATA>); close (DATA); open(DATA, ">$db") or die "File not open: $!"; foreach my $line (@data) { $line =~ s?<A HREF="directory/file.doc">?<A HREF="http://www.mysite. +com/directory/file.doc">?g; print DATA $line; }

Replies are listed 'Best First'.
Re: Changing from relative to absolute path
by davorg (Chancellor) on Sep 05, 2002 at 12:33 UTC

    CPAN (as always) is your friend. Look at the URI module, particularly the abs method.

    $uri->abs( $base_uri )
    This method returns an absolute URI reference. If $uri already is absolute, then a reference to it is simply returned. If the $uri is relative, then a new absolute URI is constructed by combining the $uri and the $base_uri, and returned.
    --
    <http://www.dave.org.uk>

    "The first rule of Perl club is you do not talk about Perl club."
    -- Chip Salzenberg

Re: Changing from relative to absolute path
by Nemp (Pilgrim) on Sep 05, 2002 at 12:37 UTC
    Update: I haven't edited the post below at all, just an update to say I appreciate this is a poor solution and suggest that it isn't used by the poster. My suggestion is way too simplistic for 99% of cases and I'd suggest you read the more learned reply from Davorg above mine :)

    If I were you I'd just change your substitution to the generic substitution I've shown here rather than a distinct one for each link...

    s#<A HREF="#<A HREF="http://www.mysite.com/#g;<br><br>
    Depending on the size of the html file you are producing you can either go over it line by line as you are or just read the whole file into one variable and do one global substitution.

    HTH,
    Neil

      What happens if the "href" attribute isn't the first attribute of the "a" tag?

      Parsing HTML with regexes is a bad idea, use the right tools for the job (in this case probably HTML::LinkExtor and URI).

      --
      <http://www.dave.org.uk>

      "The first rule of Perl club is you do not talk about Perl club."
      -- Chip Salzenberg

        I was trying a simple solution for the problem and didn't think of that... you are right of course and I apologize to the questioner if he tried my suggestion and it failed.

        As a question for my personal interest, and not a suggestion for the poster to implement, would something like this work...

        s#(<A*.?HREF=")#$1absolutedirectory/#;
        thanks,
        Neil

        Update: I should probably add that I am in no way advocating *not* using the right tools for the job - just interested if I have interpreted the problem correctly now.
A reply falls below the community's threshold of quality. You may see it by logging in.