JorkkiS has asked for the wisdom of the Perl Monks concerning the following question:

Hi!

First I want to say that this site is really good and has helped me a lot! I have a small problem though...

How should I try to solve this problem?

I want to save the data of a web page. (Kind of like in a web browser Save As function)
I have found many advices on how to download a page from web, but that's not good enough because this page is created with scripts. So if I just download this page and view the source, I only see calls to these perl scripts that create this page.
I dont see the actual data, (it's not hard coded in the HTML)

My second idea was to try to access these scripts manually and fetch the data that way. I have not managed to do that,
and I think that it will be difficult, because I cannot access the scripts directly from command line.

I can only think that I should come up with a way to save the data some how, from this web page directly.
I have found dozens of scripts and advices how to download a page, but like said, that won't do the trick.

I would be very gratefull if some one could point me to right direction or give me an advice how to tackle this one.

Thanks in advance

-JorkkiS

Replies are listed 'Best First'.
Re: Saving data from a web page?
by Dog and Pony (Priest) on Apr 15, 2002 at 11:22 UTC
    I am not sure exactly what you mean... but, this phrase:

    So if I just download this page and view the source, I only see calls to these perl scripts that create this page.

    leads me to belive that you are maybe getting/viewing the source of a frameset. Does it look something at all like:

    <frameset> <frame src="script.pl" /> <frame src="script2.pl /> </frameset>
    There might be other stuff in the tags, and it may look different, but something like that?

    If so, first have a look at a page like this one to get a basic notion of what a frameset is (real short story: it is a "page" that embeds other pages in a "grid").

    Then what you can do, is save this page to disk, then go to each of the URLs (src="this part") and save those in the same directory. Then you should be able to view the page offline, locally.

    To do the same with perl, which I assume is the final goal of this question, I'd look into LWP::Simple for fetching, and HTML::Parser or HTML::TokeParser to get the URLs from the initial frameset.

    Note that this is only a guess. Please clarify further if possible... also note that you will not be able to use any scripts locally, you can just get the results from one particular run of the script.


    You have moved into a dark place.
    It is pitch black. You are likely to be eaten by a grue.
Re: Saving data from a web page?
by Biker (Priest) on Apr 15, 2002 at 10:41 UTC

    I'm trying to understand what you really want to achieve. Do you want to capture a copy of the Perl code that was executed to give you the Web page?

    If that is so, you're out of luck. You're not supposed to be able to do that.

    If my understanding of your question is wrong, then please try to explain a little bit more about what you want to do.


    Everything went worng, just as foreseen.

Re: Saving data from a web page?
by Purdy (Hermit) on Apr 15, 2002 at 15:28 UTC
    I don't have the answer, but it looks like I may can shed some light on what it is you're looking for, so other more-knowledgable monks can help you.

    It seems you want the output of a rendered Web page that was put together with JavaScript. If Lynx supported JavaScript, you could call it with the "-dump" argument to get the rendered page. You may want to check out the documentation on whatever Web browser you have on your system and see if there's a command-line interface to get similar output.

    As a side note, I hope you use this power for good - e-mail harvesters could use this method to get JavaScript-protected e-mail addresses.

    Jason

Re: Saving data from a web page?
by kappa (Chaplain) on Apr 16, 2002 at 14:09 UTC
    If the page you want to grab is constructed using JavaScript write() (this trick is widely-known as client-side includes), then you are unlikely to succeed. The browsers just run this scriptlets while rendering the page. You won't see the result even if you "Save As" the page. And I don't know of any JavaScript engines available to Perl programs :(

      You can study the HTML document that 'includes' the external JavaScript file. The JS file is often called foo.js or something close to that.

      When you have the full URL of the .js file, some/many/all(?) http servers will let you surf the .js file as if it was an html document. As a result, you will see the full .js file in the browser.

      Disclaimer: I have tried this on a few servers and it was a while ago. But the technique should probably work in most cases.

      OTOH, I don't think this was what the original poster wanted to achieve.


      Everything went worng, just as foreseen.

Re: Saving data from a web page?
by BUU (Prior) on Apr 15, 2002 at 13:12 UTC
    If you return "Content-type: application/octet-stream\n\n" instead of the normal text/html it forces the browser to open a "Save As" dialog box and save the stdout on your computer.