LostS has asked for the wisdom of the Perl Monks concerning the following question:

OK I am grabing a web page via the LWP::Simple I have it as a $webpage I need to however replace all the href="whater" and the src="whatever" to href="http://domain/whater" and same for src... I have $webpage =~ s#href="$_#href="thdomain/$_#g; however it doesn't work... any suggestoins??
  • Comment on Grabbing a Web Page and Replace Variables

Replies are listed 'Best First'.
Re: Grabbing a Web Page and Replace Variables
by no_slogan (Deacon) on May 04, 2001 at 23:52 UTC
    A lot of people have been trying to munge html with regexes lately. HTML::Filter gives you a lot more power. I like using something like this, sweeten to taste:
    package MyFilter; use HTML::Filter; use vars qw( @ISA ); @ISA = qw( HTML::Filter ); sub start { my ($self, $tag, $attrs, $attrseq, $origtext) = @_; my $rewrite; if (exists $attrs->{src}) { $attrs->{src} = "http://foo.bar.com/$attrs->{src}"; $rewrite = 1; } if (exists $attrs->{href}) { $attrs->{href} = "http://foo.bar.com/$attrs->{href}"; $rewrite = 1; } if ($rewrite) { print "<$tag"; foreach my $attr (@$attrseq) { print qq[ $attr="$attrs->{$attr}"]; } print ">"; } else { print $origtext } } package main; my $filter = MyFilter->new(); $filter->parse($html); $filter->eof();
      That is great... however I am having my script go to the web on a totally seperate server grap a web page and then parse it... it is a a $webpage variable.
Re: Grabbing a Web Page and Replace Variables
by cLive ;-) (Prior) on May 04, 2001 at 23:49 UTC
    OK, you probably want treat the whole page as one string (s modifier), and add the i modifier to add case insensitivity.

    Like this (changed delimiter - I find | easier on the eye:

    $webpage =~ s|href="$_|href="thedomain/$_|gis

    Although, I'd probably use this (assuming all links are relative URLs):

    $webpage =~ s|<HEAD>|<HEAD><BASE href="thedomain/">|is;

    to add a BASE HREF for the page.

    .02

    cLive ;-)

      What is I am doing is grabing a page... Then I am saying Hey look for any places you find the <img src="whatever"> tag and insert int he <img src="... make it now say <img src="http://www.domain.com/whatever.gif"> I don't always know the picture name or the web page it is pointing to thus I need to specify the assign the rest of that line to a $_ or some variable to append the to.
Re: Grabbing a Web Page and Replace Variables
by runrig (Abbot) on May 04, 2001 at 23:58 UTC
    Are you expecting the $_ variable to magically know that it should be "whatever", or is it actually set to something? Or do you mean something like this:
    s#href="([^":/]*?)"#href="http://domain/$1"#g
    Or as others have suggested (in the Chatterbox), use HTML::TokeParser
      I think you just posted the code i was looking for.
      That got it :) Thanks :)