in reply to Downloading continous updates from webpage
When you say "incremental updates", does each refresh contain all the preceeding information?
If so, you probably only need the final page, which from your description should be easy to detect because of the presence of summary information.
Presumably the intermediate pages displayed in the browser are fetched as a result of a meta refresh tag or javascript refresh every few minutes? When automated, you wouldn't need the autorefreshes as you are only going to discard them, but it may be necessary to fetch them anyway as the server may decide to cancel the processing if it doesn't see a refresh request at regular intervals.
Depending upon the complexity of the page and the refresh mechanism used, you might get away with using LWP::Simple to get or put the url successively (at appropriately timed intervals), scanning the content returned and discarding it until it contains the summary information.
In more complex cases, you may need to scan the content returned by the first submit and extract the refresh url from embedded javascript. It may even be necessary to rescan every partial content returned page to extract a different url.
It might be easier to use WWW::Mechanize, though I'm not sure that it copes with embedded javascript refreshes?
Providing a code example is pretty much impossible without seeing the pages involved. If the url is public, you could post it, (or /msg it to a willing responder if you don't want to overtax the server), and you might get a worked example.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Downloading continous updates from webpage
by acid06 (Friar) on Feb 16, 2006 at 14:20 UTC |