coder57 has asked for the wisdom of the Perl Monks concerning the following question:

Basically I have a local html file, called file1.html it has a series of links (with a particular domain name) in addition to the html code, I am trying to follow each of these links (based on the regular expression /on\.fe/) each of these links, in their content have a link to another page, (I would like to capture this particular page based on a regular expression /www\.arax/), and substitute for each link (with regular expression /on\.fe/)in file1.html with their corresponding link (with regular expression/www\.arax/) So far this is what I have come up with, and am a little stuck
#! perl\bin\perl use strict; use warnings; use WWW::Mechanize; my $mech = WWW::Mechanize->new(); open(FILE, "< file1.html") || print "Unable to open the file file1 \n" +; while (<FILE>) { if($_ =~ /on\.fe/) { my $url = $_; print $mech->uri."\n"; $mech->get($_); $mech->content(); if($mech->content()=~ /www\.arax/) { my $url2 = $mech->content() =~ /www\.arax/; print $mech->uri."\n"; s/$url/$url2/; print; } } } close(FILE);

Replies are listed 'Best First'.
Re: Substituting for each regular exp in a local file
by jettero (Monsignor) on Dec 23, 2006 at 15:42 UTC

    Almost. This should clear things up a little:

    for my $l ($mech->links) { my $url = $l->url; my $desc = $l->text; my $new = $url; $new =~ s/lol/Laugh Outloud/; print "concerning $desc, $url should be $new\n"; # $mech->get( $new ); }

    Be sure to read the WWW::Mechanize::Link page for further info about the links returned by WWW::Mechanize->links().

    -Paul

      I should have mentioned that I am on activeperl 5.86 (I assume WWW::Mechanize::Link is inbuilt in mechanize?? I haven't been able to find it in the active repository ) So far this is what I have, it simply compiles and goes on to the next prompt, however nothing is done, based on the regular expression which I double checked, the url in file1.html should match the regular expression, maybe it is something basic I have missed
      #! perl\bin\perl use strict; use warnings; use WWW::Mechanize; my $mech = WWW::Mechanize->new(); open(FILE, "< file1.html") || print "Unable to open the file file1 \n" +; while (<FILE>) { for my $l ($mech->links) { my $url = $l->url; my $desc = $l->text; my $new = $url; $new =~ s/on\.fe/arax.com/; print "concerning $desc, $url should be $new\n"; # $mech->get( $new ); } } close(FILE);
        If it helps a sample of the html code from file1.html :
        <li class="MsoNormal" style="line-height: 18.0pt; text-autospace: ideograph-numeric ideograph-other; background: white"> <span style="font-size: 11.0pt; font-family: Tahoma"> <a href="http://...online.feeds.com/link1/" target="_blank" style="c +olor: blue; text-decoration: underline; text-underline: single"> <span style="color: #336699; text-decoration: none">Links Part 2</span></a> </span></li> <li class="MsoNormal" style="line-height: 18.0pt; text-autospace: ideograph-numeric ideograph-other; background: white"> <span style="font-size: 11.0pt; font-family: Tahoma"> <a href="http://...online.feeds.com/link2/" target="_blank" style="c +olor: blue; text-decoration: underline; text-underline: single"> <span style="color: #336699; text-decoration: none">Links Part 3</span></a> </span></li> <li class="MsoNormal" style="line-height: 18.0pt; text-autospace: ideograph-numeric ideograph-other; background: white"> <span style="font-size: 11.0pt; font-family: Tahoma"> <a href="http://...online.feeds.com/link3/" target="_blank" style="c +olor: blue; text-decoration: underline; text-underline: single"> <span style="color: #336699; text-decoration: none">Links Part 4</span></a> </span></li> <li class="MsoNormal" style="line-height: 18.0pt; text-autospace: ideograph-numeric ideograph-other; background: white"> <span style="font-size: 11.0pt; font-family: Tahoma"> <a href="http://...online.feeds.com/link4/" target="_blank" style="c +olor: blue; text-decoration: underline; text-underline: single"> <span style="color: #336699; text-decoration: none">Links Part 5</span></a> </span></li>
        The contents of the link http://...online.feeds.com/link1/ for example is:
        <body> ... </td></tr><tr><td style="height:81%;width:100%;padding:0;text-align:left;"><embed src="http://...arax.../v/gomlckZfGYU..." </embed> </td> </tr> <tr> <td style="height:13%;width:100%;padding:0;text-align:left;">