Re^2: Crawling Relative Links from Webpages

OK so maybe I am missing something here, because I am just unable to understand what's being said :(

$mech above uses a hard coded link, which would of course work for this page. What about those from other domains (say "xyz.com")?

How do I make the method generalizable?

Comment on Re^2: Crawling Relative Links from Webpages

Replies are listed 'Best First'.
Re^3: Crawling Relative Links from Webpages by Corion (Patriarch) on May 08, 2010 at 14:34 UTC
There is only one hard-coded address in the code: `my $mech = WWW::Mechanize->new(); $mech->get("http://dspace.mit.edu/handle/1721.1/53720");` [download] If you want to make that variable, maybe you want to pass the starting link from the command line? It will then be available via `@ARGV`: `my $mech = WWW::Mechanize->new(); warn "Fetching $ARGV[0]\n"; $mech->get($ARGV[0]);` [download] Call it as `perl -w listanand.pl http://google.com` [download]	[reply] [d/l] [select]
Re^4: Crawling Relative Links from Webpages by listanand (Sexton) on May 08, 2010 at 15:32 UTC
Ah yes of course. What was I even saying. I get it now. Thank you very much everyone. This has solved my problem ! Although I still get a warning "Use of uninitialized value in string eq at crawler.pl line <line where I check for pdf mime type>". Makes me wonder... Andy	[reply]
Re^5: Crawling Relative Links from Webpages by Your Mother (Archbishop) on May 08, 2010 at 17:04 UTC
I still get a warning "Use of uninitialized value in string eq at crawler.pl This line- `no warnings "uninitialized";` -isn't for show. :) A path that is "dir" -- like / -- will not have a mime type and various other paths will fail to be found too.	[reply] [d/l]
Re^6: Crawling Relative Links from Webpages by listanand (Sexton) on May 09, 2010 at 01:12 UTC