jialanw has asked for the wisdom of the Perl Monks concerning the following question:
Hi all,
I'm trying to write a script to download some webpages using LWP.
The problem is that the responses I'm getting are incomplete webpages - they only contain some of the content of what I see in my normal browser, omitting seemingly random pieces of code - comments, javascript, forms, etc. Somehow for this given page even a simple 'get' command yields this issue. I've tried using the ->as_string, ->content, and :content_file attributes, but all of them have the missing code problem. Also, the content from 'get' which is saved to the file actually changes on different calls to the program.
I've tried it with other websites and it seems to work - is this caused by the website I'm trying to download from? How can I get around it?
I do use a cookie to log onto the site, but I don't think that's the problem.
Here's the code:
-----------------
use LWP; $ua =LWP::UserAgent->new; $res = $ua->get("https://ecf.$district.uscourts.gov/cgi-bin/iquery.pl" +, ':content_file' => 'test.html');
-----------------
As an example of the lost code, here's the code from going to the site and using "Save as" from the browser:
<script language="JavaScript"> FirstField="case_num";</script> <form e +nctype="multipart/form-data" method="post" action="/cgi-bin/iquery.pl +?109027233035598-L_758_0-1"> <!--ShowPage(iquery.htm)--> <!-- rcsid="$Header: /usr/local/cvsroot/ba +nkruptcy/web/html/iquery.htm,v 3.6 2005/02/07 20:00:34 gamores Exp $" + -->
Here's what I get from the saved content file from "get":
<SCRIPT LANGUAGE="JavaScript"> FirstField="case_num";</SCRIPT><!-ShowP +age(iquery.htm)-> <!-- rcsid="$Header: /usr/local/cvsroot/bankruptcy/ +web/html/iquery.htm,v 3.6 2005/02/07 20:00:34 gamores Exp $" -->
Notice that the form has disappeared when using "get". Does "get" reformat the code?
Thanks!!
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: get with LWP drops HTML
by NetWallah (Canon) on Sep 30, 2008 at 05:00 UTC | |
|
Re: get with LWP drops HTML
by smiffy (Pilgrim) on Sep 30, 2008 at 05:03 UTC | |
by jialanw (Initiate) on Oct 05, 2008 at 02:16 UTC | |
by b10m (Vicar) on Oct 05, 2008 at 20:07 UTC | |
by jialanw (Initiate) on Oct 07, 2008 at 00:46 UTC | |
|
Re: get with LWP drops HTML
by ikegami (Patriarch) on Sep 30, 2008 at 04:51 UTC | |
by tinita (Parson) on Sep 30, 2008 at 11:29 UTC | |
|
Re: get with LWP drops HTML
by jialanw (Initiate) on Oct 07, 2008 at 22:36 UTC |