comment on

Here's what I've come up with. Looks even uglier I think, but it looks like it worked. It just needs to skip over "mailto:" links, which is easy.

#!/usr/bin/perl -w
use strict;

## initialize the objects that we need
use WWW::Mechanize;            ## used to fetch the page we want

my $mech       = WWW::Mechanize->new();        ## our ::Mechanize obje
+ct

## initialize an array of "bad" links 
## we'll write this to a file when we're done
my @bad_links;

## site root
my $site_root = "http://www.mscd.edu/~women/scholarships/";

## array of URLs to check
## probably wanna stick these in a file in the future
my @urls_to_check = ('schola-f.shtml', 'scholg-l.shtml', 'scholm-r.sht
+ml', 'schols-z.shtml');


my $bad_links_file = "badlinks.txt";
my %checked_urls;
## Start!
## loop through our urls we need to check

## Thanks to Joost from perlmonks
for ( @urls_to_check ) {
    
    print "Trying to get " . $site_root . $_ . "\n";
    if ( $_ eq $checked_urls{$_} ) {
   
        print "Link checked, skipping\n";
        next;

    } else {
       
        $mech->get( $site_root . $_ ); # or next if $site_root.$_ eq $
+checked_urls{$site_root.$_};
        print "Got ". $site_root . $_ ."\n" unless $mech->success;
        $checked_urls{$site_root} = $site_root . $_;

        for my $link ($mech->find_all_links) { # on this page
            
            if ( $link->url eq $checked_urls{$link->url} ) {
                
                print "Link checked, skipping\n";
                next;
 
            } else {
                print "Getting ". $link->url ."\n";
                $mech->get($link->url);
                $checked_urls{$link->url} = $link->url;

                unless ($mech->success) {
 
                    print "can't get ".$link->url.", status: ".$mech->
+status;
                    push @bad_links, $link->url;
   
               }
  
               $mech->back;

         }
        }
    }
}

print "Finished checking links.  Writing results.\n";
open (BADLINKS, '>>', $bad_links_file);
for ( @bad_links ) {
    
    print BADLINKS $_ . "\n";

}
close (BADLINKS); 



## Finished!
[download]

meh.

In reply to Re^2: Logging URLs that don't return 1 with $mech->success by stonecolddevin
in thread Logging URLs that don't return 1 with $mech->success by stonecolddevin

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.