Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Re: Scraping a website

by markong (Pilgrim)
on Jul 31, 2018 at 11:44 UTC ( [id://1219547]=note: print w/replies, xml ) Need Help??


in reply to Scraping a website

# check each hypertext link within page my @html = split(/a href=/, $html);

A recommendation: you are doing a lot of extra work to collect URLs and save the relative content, the code is a bit verbose and you could still miss something; peruse "standard" tools to help yourself:

  1. HTML::LinkExtor - Extract links from an HTML document
  2. LWP::UserAgent - Web user agent class - look at its get(...) method and in particular to its :content_file => $filename parameter

This should simplify things and help a lot

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1219547]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others goofing around in the Monastery: (5)
As of 2024-04-18 03:36 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found