mrguy123 has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks. I have used the LWP::UserAgent to retrieve a web page (something I do a lot that usually works). This time, however, when printing the web page
(http://www.keynote.co.uk/kn2k1/CnIsapi.dll?nuni=98781&usr=11402srv=04 +&alias=kn2k1&uni=1147832502&fld=S&S_type=search&db=KEYNOTE&search_0=c +ats&Field0=AT&OpList_0=AND&search_99=KN&Field99=C4&OpList_99=AND&NoAu +toAnd=1)

this is what I got:

<!--Copyright 2002 Clarinet Systems Ltd.-->^M <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">^M <html><head><title>Welcome to Key Note.co.uk</title>^M <META HTTP-EQUIV="Expires" CONTENT="Mon, 12 Jul 1999 17:45:00 GMT">^M ^M </head>^M <frameset border="-2" framespacing="-2" cols="190,*">^M <frame noresize noresize name="side" src="/kn2k1/\11402_04\FRM4123\ +SUMMARY.FRM\side.HTM" scrolling="auto">^M <frame noresize noresize name="main" src="/kn2k1/\11402_04\FRM4123\ +SUMMARY.FRM\main.HTM" scrolling="auto">^M <noframes>^M <body bgcolor="#FFFFFF">^M <p><b>This web page uses frames, but your browser doesn't support th +em.</b></p>^M </body>^M </noframes>^M </frameset>^M ^M </html>^M


I used FireFox to view the source code, and found it possible to view the frame I needed (with all the info I wanted).
Does anyone know how I can get frame source code from the LWP::UserAgent?
thanks in advance,
Guy Naamati

Edit: g0n - code tags


Thanks for the help, it now works just fine

Replies are listed 'Best First'.
Re: Web Pages with frames
by wfsp (Abbot) on May 17, 2006 at 09:21 UTC
    LWP::UserAgent has done what you asked it to do i.e. return the HTML for that URL. If you need the contents of one of the frames make another call with that URL. For instance in this case it may be the frame named 'MAIN' and you would need the URL endiing 'main.HTM'.

    HTML::TokeParser::Simple would be a good place to start if you need Perl to extract it.

    If you have trouble with the parser show us what you have tried and I'm sure we'll be able to give you a hand.

    Good luck!

Re: Web Pages with frames
by gellyfish (Monsignor) on May 17, 2006 at 10:21 UTC

    As pointed out you will need to parse the HTML of the frameset in order to get the individual frames. Fortunately for you here is one I prepared earlier:

    #!/usr/bin/perl -w use strict; use LWP::UserAgent; use HTML::Parser; use URI; my $starturl = shift || die "No url supplied\n"; my $baseuri = URI->new($starturl); my @urls ; push @urls,$starturl; my $agent = new LWP::UserAgent; my $parser = HTML::Parser->new(api_version => 3, start_h => [\&start ,"tagname, attr"]); $agent->agent("Gelzilla/666"); while( my $url = shift @urls) { my $request = new HTTP::Request 'GET' => $url; my $result = $agent->request($request); if ($result->is_success) { print $result->as_string; $parser->parse($result->content); } else { print "Error: " . $result->status_line . "\n"; } } sub start { my($tag,$attr) = @_; if ($tag eq 'frame' ) { my $thisuri = URI->new($attr->{src}); push @urls, $thisuri->abs($baseuri); } }
    HTH

    /J\

Re: Web Pages with frames
by holli (Abbot) on May 17, 2006 at 09:22 UTC
    You have to scan the frameset html, look for the url of the frame you want and then send a request for that.


    holli, /regexed monk/
Re: Web Pages with frames
by blazar (Canon) on May 17, 2006 at 09:08 UTC
    $ua->agent( $product_id ) Get/set the product token that is used to identify the user + agent on the network. The agent value is sent as the "User-Agent" header in the requests. The default is + the string returned by the _agent() method (see below).

    However IIUC the page you found has links to the individual frames: just parse it, extract those links and donwload them.