Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Help Parsing lwp browser request content

by mmittiga17 (Scribe)
on Mar 13, 2008 at 17:32 UTC ( [id://674024]=perlquestion: print w/replies, xml ) Need Help??

mmittiga17 has asked for the wisdom of the Perl Monks concerning the following question:

Hi All, I am working on a script to automate downloading files from a website. I am using LWP::UserAgent. I need to parse the output from the returned content to get the value of the TOKEN: <input name="org.somescript.html.TOKEN" type="hidden" value="f152a40cd9234c57542ee5b0c057cb0b" />

my $browser = LWP::UserAgent->new(); $browser->cookie_jar( {} ); #Post to some URL with some data $url="https://somesite.com/LoginAction.do"; my $req0 = HTTP::Request->new(GET => $url); my $resp0 = $browser->request($req0); # Response can be read like this: print $resp0->content;

the print $resp0->content; line returns the entire html code from the site. I have tried endlessly to parse the content for just the TOKEN Value. Can any one recommend a method? Thanks!

Replies are listed 'Best First'.
Re: Help Parsing lwp browser request content
by wfsp (Abbot) on Mar 13, 2008 at 17:58 UTC
    This uses HTML::TokeParser::Simple
    #!/usr/local/bin/perl use strict; use warnings; use HTML::TokeParser::Simple; # load your html into a string my $content = do{local $/;<DATA>}; my $p = HTML::TokeParser::Simple->new(\$content); while (my $t = $p->get_tag(q{input})){ my $name = $t->get_attr(q{name}); my ($token) = $name =~ /^org\.somescript\.html\.(.*)$/; print $token; # prints TOKEN } __DATA__ <!-- lots of html --> <input name="org.somescript.html.TOKEN" type="hidden" value="f152a40cd9234c57542ee5b0c057cb0b" /> <!-- lots more html -->
    other html parsers are available
Re: Help Parsing lwp browser request content
by Roy Johnson (Monsignor) on Mar 13, 2008 at 18:01 UTC
    A quick and dirty way might be
    my $whole_page = $resp0->content; my ($token) = $whole_page =~ /<input[^>]*?value="(.*?)"/;
    A more robust solution might be to use WWW::Mechanize and search through the input fields it parses. Or what wfsp said. :)

    Caution: Contents may have been coded under pressure.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://674024]
Approved by wfsp
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others contemplating the Monastery: (3)
As of 2024-04-23 22:51 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found