LWP::UserAgent doesn't parse content. WWW::Mechanize does.
$ cat ./langley #!/usr/bin/perl use strict; use WWW::Mechanize; my $browser = WWW::Mechanize->new( autocheck=>1 ); $browser->get('http://www.google.com'); $browser->submit_form( form_name => 'f', fields => { q => 'langley public library', }, ); my @links = $browser->links(); for my $link ( @links ) { my $abs = $link->url_abs; next if $abs =~ m[^http://.+google\.com/]; # Google links next if $abs =~ m[^http://\Q64.233.167.104/]; # Cache print $link->text, "\n\t", $link->url, "\n"; } $ ./langley Fraser Valley Regional Library http://www.fvrl.bc.ca/comm_branch_langleycity.asp Sno-Isle Libraries http://www.sno-isle.org/ Public Visitor's Page for NASA Langley Technical Library http://library.larc.nasa.gov/Public/ Notice for NASA Langley Employees visiting the Technical Library ... http://library.larc.nasa.gov/Public/nasalangley.htm Canadian library Web sites and catalogues by region: British ... http://www.collectionscanada.ca/gateway/s22-221-e.html LANGLEY PUBLIC LIBRARY in LANGLEY, Oklahoma Library Data / Profile http://www.librarybug.org/library-OK0111.html 1st Services Squadron - Langley Air Force Base, Virginia http://www.langley.af.mil/1msg/1svs/Library.shtml Public Libraries, Oklahoma (Books) http://www.ohwy.com/ok/l/library.htm Library - Langley High School http://www.fcps.k12.va.us/LangleyHS/library/ Kings Langley Public School Library http://members.ozemail.com.au/~stewil/fivew.html
All the other caveats about Google not liking scraping still applies. Please also take a look into Spidering Hacks by Kevin Hemenway and Tara Calishain.

xoxo,
Andy


In reply to Re: LWP and Google by petdance
in thread LWP and Google by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.