bennierounder has asked for the wisdom of the Perl Monks concerning the following question:
Hi guys,
I'm very frustrated with this code
1,17 Top#!/usr/bin/perl -w # a simple web crawler use strict; use LWP::Simple; my $url = shift || die 'Please provide an initial url after filename!' +; my $max = 10; my $html = get($url); my @urls; while ($url =~ s/(https:\/\/\S+)[">]//) { push @urls, $1; print @urls; } mkdir "web" , 0755; open (URLMAP, ">", "web/url.map" ) || die ("can't open web\/url.map\n" +); my $count = 0; for (my $i=0; $i<$max; $i++) { my $source = $urls[int(rand($#urls+1))]; getstore($source, 'web/$count.html'); print URLMAP "$count\n$source\n"; $count++; } close URLMAP;
I run the script, perl web_crawl.pl https://www.money.co.uk and I get this!
perl web_crawl.pl https://www.google.com
Use of uninitialized value $source in concatenation (.) or string at web_crawl.pl line 27.
Use of uninitialized value $source in concatenation (.) or string at web_crawl.pl line 27.
Use of uninitialized value $source in concatenation (.) or string at web_crawl.pl line 27.
Use of uninitialized value $source in concatenation (.) or string at web_crawl.pl line 27.
Use of uninitialized value $source in concatenation (.) or string at web_crawl.pl line 27.
Use of uninitialized value $source in concatenation (.) or string at web_crawl.pl line 27.
Use of uninitialized value $source in concatenation (.) or string at web_crawl.pl line 27.
Use of uninitialized value $source in concatenation (.) or string at web_crawl.pl line 27.
Use of uninitialized value $source in concatenation (.) or string at web_crawl.pl line 27.
Use of uninitialized value $source in concatenation (.) or string at web_crawl.pl line 27.
I'm trying to eventually get the prices and company names, so for example for this part of the site https://www.money.co.uk/travel-money/japanese-yen-exchange-rate.htm I want to get the prices on offer into an array in order (highest first), maybe keeping a note of the company name so may need a hash or array of hashes.
That's the end goal, but stuck on the first hurdle, which is viewing the sites html in files where i can search the prices, then extract them from the files!!! If you can think of a better way and point me in the right direction on finding the solution, I'm all ears! Thanks in advance!
Please help!
|
---|
Replies are listed 'Best First'. | |
---|---|
Re: First Web Crawl Task
by marto (Cardinal) on Sep 21, 2018 at 08:13 UTC | |
Re: First Web Crawl Task
by roboticus (Chancellor) on Sep 21, 2018 at 11:59 UTC |