Re^2: Yet Another Scraping Question

Answer, I read the text, of course. But did I mention that the pages contain repetitive tables of data? It's entirely possible for any given row to contain the same text as the row on a previous page.

But I guess you've given me an answer, because each product does have a unique ID so I can parse the HTML of a certain row of that table and check its "&PRODUCTID=" against the same value saved from before.

I was just hoping for something more ... sexy I guess.

($_='kkvvttuu bbooppuuiiffss qqffssmm iibbddllffss')
=~y~b-v~a-z~s; print

Comment on Re^2: Yet Another Scraping Question

Replies are listed 'Best First'.
Re^3: Yet Another Scraping Question by hossman (Prior) on Apr 18, 2006 at 02:19 UTC
But what visual clue do you look at when reading the page that indicates to you that it's the same as the previous page? how can you tell the difference between "new data" that is "the same" as the data that was on the previous page, and "old data" that really is just a repeat of the data you've already seen? If you, as a human, can't do that -- then there's no way your code will be able to. ... But it sounds like you've already found your answer. you can tell if the page you are looking at is the same by looking at the PRODUCTID of each row, and if there is a duplicate (or all duplicate) from the last page, then it's hte same page.	[reply]

Replies are listed 'Best First'.

Re^3: Yet Another Scraping Question
by hossman (Prior) on Apr 18, 2006 at 02:19 UTC

But what visual clue do you look at when reading the page that indicates to you that it's the same as the previous page? how can you tell the difference between "new data" that is "the same" as the data that was on the previous page, and "old data" that really is just a repeat of the data you've already seen?

If you, as a human, can't do that -- then there's no way your code will be able to. ... But it sounds like you've already found your answer. you can tell if the page you are looking at is the same by looking at the PRODUCTID of each row, and if there is a duplicate (or all duplicate) from the last page, then it's hte same page.

[reply]