in reply to get with LWP drops HTML

I can't test anything here as you haven't actually given us a full URI which you are trying to retrieve. However, here are some things to consider:

I'm not saying that either of these are the problem, but they should be borne in mind when trying to retrieve pages using simpler user agents such as LWP. Before you start writing your own user agents in Perl or any other language, you really need to know what's going on with the target site. Forms, scripts, cookies and the like that are handled quietly by graphical browsers may need to be addressed in your code.

The best way to get a better picture of what is going on is to use a different browser - something like Lynx. Text-only browsers are far closer to user agents that you might created using LWP than graphical ones like Firefox, Opera, IE, etcetera. At the very least, I would suggest that you test this all out in a browser with JavaScript disabled. Lynx, links, wget and friends are, however, the tools that I'd recommend to get to the bottom of this.

Hope this helps.

Replies are listed 'Best First'.
Re^2: get with LWP drops HTML
by jialanw (Initiate) on Oct 05, 2008 at 02:16 UTC
    Thanks all, for all of the info.

    I've tried using wget for some sample pages and the code-dropping does not seem to be present there. It would still be nice to use Perl to go through all of the forms I need, but I am trying now to just find a solution to get the data I need using a hack with the incomplete data from LWP to generate WGET commands.

    It would still be nice to know for future reference what the hell is going on though!

      It would still be nice to use Perl to go through all of the forms I need

      Please do yourself a favour then and look at good monk petdance's WWW::Mechanize. It will safe you countless hours of work ;)

      --
      b10m
        Thanks,

        I've tried Mechanize already (along with changing the user-agent and countless other helpful suggestions). Still has the same problem. Blarg!