Hi all,

I'm trying to write a script to download some webpages using LWP.

The problem is that the responses I'm getting are incomplete webpages - they only contain some of the content of what I see in my normal browser, omitting seemingly random pieces of code - comments, javascript, forms, etc. Somehow for this given page even a simple 'get' command yields this issue. I've tried using the ->as_string, ->content, and :content_file attributes, but all of them have the missing code problem. Also, the content from 'get' which is saved to the file actually changes on different calls to the program.

I've tried it with other websites and it seems to work - is this caused by the website I'm trying to download from? How can I get around it?

I do use a cookie to log onto the site, but I don't think that's the problem.

Here's the code:

-----------------

use LWP; $ua =LWP::UserAgent->new; $res = $ua->get("https://ecf.$district.uscourts.gov/cgi-bin/iquery.pl" +, ':content_file' => 'test.html');

-----------------

As an example of the lost code, here's the code from going to the site and using "Save as" from the browser:

<script language="JavaScript"> FirstField="case_num";</script> <form e +nctype="multipart/form-data" method="post" action="/cgi-bin/iquery.pl +?109027233035598-L_758_0-1"> <!--ShowPage(iquery.htm)--> <!-- rcsid="$Header: /usr/local/cvsroot/ba +nkruptcy/web/html/iquery.htm,v 3.6 2005/02/07 20:00:34 gamores Exp $" + -->

Here's what I get from the saved content file from "get":

<SCRIPT LANGUAGE="JavaScript"> FirstField="case_num";</SCRIPT><!-ShowP +age(iquery.htm)-> <!-- rcsid="$Header: /usr/local/cvsroot/bankruptcy/ +web/html/iquery.htm,v 3.6 2005/02/07 20:00:34 gamores Exp $" -->

Notice that the form has disappeared when using "get". Does "get" reformat the code?

Thanks!!


In reply to get with LWP drops HTML by jialanw

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.