Cody Pendant has asked for the wisdom of the Perl Monks concerning the following question:
There's a "next" button at the bottom. Always, even when you've reached the end of the data (the last page just reloads and the URL doesn't change).
So I can't just follow the "next" button with Mechanize, because it will stay in the loop forever.
So, how do I check whether "next" is really "next" or just the same page again?
I thought perhaps I could just save the length of the previous page's content and check it against the current one, but of course, nothing says two pages with different content can't have the same length, especially when they're repetitive tables of data.
So, next thought, I can compare the whole of the page as two huge strings: if($last_html eq $this_html) but as the page has things like timestamps in it, this could give a false negative.
Any ideas?
($_='kkvvttuu bbooppuuiiffss qqffssmm iibbddllffss')
=~y~b-v~a-z~s; print
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Yet Another Scraping Question
by bobf (Monsignor) on Apr 18, 2006 at 01:23 UTC | |
by Cody Pendant (Prior) on Apr 18, 2006 at 01:45 UTC | |
by Cody Pendant (Prior) on Apr 18, 2006 at 02:09 UTC | |
|
Re: Yet Another Scraping Question
by hossman (Prior) on Apr 18, 2006 at 00:59 UTC | |
by Cody Pendant (Prior) on Apr 18, 2006 at 01:39 UTC | |
by hossman (Prior) on Apr 18, 2006 at 02:19 UTC | |
|
Re: Yet Another Scraping Question
by izut (Chaplain) on Apr 18, 2006 at 09:20 UTC | |
by Cody Pendant (Prior) on Apr 19, 2006 at 05:15 UTC | |
by izut (Chaplain) on Apr 19, 2006 at 09:21 UTC | |
|
Re: Yet Another Scraping Question
by polettix (Vicar) on Apr 18, 2006 at 10:34 UTC | |
by Cody Pendant (Prior) on Apr 19, 2006 at 05:13 UTC |