Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Re^2: The State of Web spidering in Perl

by digital_carver (Sexton)
on Sep 22, 2013 at 16:49 UTC ( [id://1055192]=note: print w/replies, xml ) Need Help??


in reply to Re: The State of Web spidering in Perl
in thread The State of Web spidering in Perl

I'll give HTML::Parser a second look, thanks for the suggestion. How do you match something like //div[@id='blah']/p though, do you explicitly maintain state?

As for LWP vs Mech, LWP does work for my use case, I just prefer Mech for a few niceties like autocheck, auto-delegation of $mech->content() to $response->decoded_content(), cookie_jar defaulting to on, etc.

Replies are listed 'Best First'.
Re^3: The State of Web spidering in Perl
by Anonymous Monk on Sep 23, 2013 at 00:03 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1055192]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (4)
As of 2024-04-25 14:44 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found