Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Please advise why I keep getting these errors when using this WEB LINK CHECK script:
use strict; use LWP::Simple; use HTML::TokeParser; use HTML::Entities; my @newspages = qw( http://osis.nima.mil http://osis.nima.mil/myhot.html http://osis.nima.mil/myoffices.html http://osis.nima.mil/mytraining.html http://osis.nima.mil/mygeospatial.html ); for (@newspages) { my $html = $_; my ($junk,$short) = split(/\./,$html); # get domain name my $body .= "<td valign=top>$short<br>"; my $get = get("$html"); my $p = HTML::TokeParser->new(\$get); while (my $token = $p->get_tag("a")) { my $url = $token->[1]{href} || "-"; my $text = $p->get_trimmed_text("/a"); unless ($url =~ /^mailto|^javascript/){ # don't grab javascrpt or mai +lto's $body .= "<a href=\"$url\" target=\"new\">$text</a><br>\n"; } } $body .= "</td>" } my $body .= "</tr></table>"; open(OUT,">news.txt"); # send to an html file print OUT "$body";

MY error messages on my NT workstation:
Use of uninitialized value in substr at C:/Perl/site/lib/HTML/PullPars +er.pm line 82. Use of uninitialized value in length at C:/Perl/site/lib/HTML/PullPars +er.pm line 85. Use of uninitialized value in substr at C:/Perl/site/lib/HTML/PullPars +er.pm line 82. Use of uninitialized value in length at C:/Perl/site/lib/HTML/PullPars +er.pm line 85.

Replies are listed 'Best First'.
Re: web link errors
by Stegalex (Chaplain) on Mar 12, 2002 at 14:05 UTC
    The W3C has a public domain Perl script that will check your site for dead links. Why not use it instead? It's here. I like chicken.
Re: web link errors
by silent11 (Vicar) on Mar 12, 2002 at 13:59 UTC
    This looks like a mod of some code I posted here. On the surface, I don't see anything wrong with your code, except for the fact that those URL's don't exist (at least not at the momnet for me). Do you have all the modules installed?
    LWP::Simple; HTML::TokeParser; HTML::Entities;

    Also, when I posted this code, I was running the script against domains w/o subdomains as you have here. I split on /\./ to get the domain, you will only get the subdomian.
    -Silent11
      thanks for your reply! Here are my ppm listings:
      Archive-Tar Compress-Zlib Digest-MD5 File-CounterFil Font-AFM HTML-Parser HTML-Tagset HTML-Tree MIME-Base64 PPM SOAP-Lite Storable Tk URI XML-Parser XML-Simple libnet libwin32 libwww-perl
      I thought some were the same as what you had listed or close to it?
      Also should I list ALL my links in my HTML page in the newspages array??
      I like your script because it is not long and complex for a beginner like me so I can learn off it.