in reply to Downloading continous updates from webpage

When you say "incremental updates", does each refresh contain all the preceeding information?

If so, you probably only need the final page, which from your description should be easy to detect because of the presence of summary information.

Presumably the intermediate pages displayed in the browser are fetched as a result of a meta refresh tag or javascript refresh every few minutes? When automated, you wouldn't need the autorefreshes as you are only going to discard them, but it may be necessary to fetch them anyway as the server may decide to cancel the processing if it doesn't see a refresh request at regular intervals.

Depending upon the complexity of the page and the refresh mechanism used, you might get away with using LWP::Simple to get or put the url successively (at appropriately timed intervals), scanning the content returned and discarding it until it contains the summary information.

In more complex cases, you may need to scan the content returned by the first submit and extract the refresh url from embedded javascript. It may even be necessary to rescan every partial content returned page to extract a different url.

It might be easier to use WWW::Mechanize, though I'm not sure that it copes with embedded javascript refreshes?

Providing a code example is pretty much impossible without seeing the pages involved. If the url is public, you could post it, (or /msg it to a willing responder if you don't want to overtax the server), and you might get a worked example.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Replies are listed 'Best First'.
Re^2: Downloading continous updates from webpage
by acid06 (Friar) on Feb 16, 2006 at 14:20 UTC
    When you say "incremental updates", does each refresh contain all the preceeding information?

    From what the poster said, I think there's no refreshing of the page at all.
    I think the server is just printing stuff and the browser renders what it cans before the whole page is done downloading. This works kind of well is some scenarios and even better if you turn on autoflush on the server side.

    However there are some catches. E.g. AFAIK, IE will only render a table after it gets the closing tag. And possibly some more of these kind of glitches.


    acid06
    perl -e "print pack('h*', 16369646), scalar reverse $="