One could think of presenting the same information in different formats: everytime the next page is presented the lay-out is somewhat different: can be as simple as switching zipcode and city around; or moving the contact address to the front and the link to the picture of the house to the back of the row; or ...
This would certainly annoy any automatic harvesting of your pages.
Or include "invisible" records (e.g. background and foreground color the same and very small point size) with good-looking but bogus information, which would then poison the harverster's database (although this will be bad for text-only browsers).
If you do not have to cater for text-only browsers, one can think of providing the data in XML-format with the tags being given random names for this page only (and of course a different sequence of field-tags within the record tags, with some unused fields tags thrown in for good measure, e.g. two addresses and two phone numbers for each record, one of which will only be rendered) and making a "this page only" XSLT-file which translates client-side the data into HTML. Modern browsers will translate the XML into HTML on the fly, but it will take a fairly sophisticated harvester to make sense of it (or a lot of post-processing the raw data).
CountZero
"If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law
Re: Re: State-of-the-art in Harvester Blocking
by davis (Vicar) on Nov 23, 2003 at 13:51 UTC
|
One could think of presenting the same information in different formats: everytime the next page is presented the lay-out is somewhat different: can be as simple as switching zipcode and city around; or moving the contact address to the front and the link to the picture of the house to the back of the row; or ...
This would certainly annoy any automatic harvesting of your pages.
It'd also likely annoy regular users of your site. Imagine if the PM voting buttons moved around (sometimes above a node, sometimes below, inconsistent order, etc), or if the nodelets' positions couldn't be guaranteed.
Humans are pretty good at spotting differences in information if the information's layed out in a consistent manner. If you're comparing properties, those differences (price, number of bedrooms, city) will be all important, so making them harder to spot will also make your site harder to use.
Or include "invisible" records (e.g. background and foreground color the same and very small point size) with good-looking but bogus information, which would then poison the harverster's database (although this will be bad for text-only browsers).
I'm nit-picking now, but not just text-only browsers. What about people using high-contrast colour schemes in their browsers? I genuinely believe the OP has a difficult task if they want to preserve their goal of accessibility.
davis
It's not easy to juggle a pregnant wife and a troubled child, but somehow I managed to fit in eight hours of TV a day.
| [reply] |
|
|
Invisible records could be included in more ways than just color. Perhaps some trickery with CSS,spans,div tags, etc. The rearrangeing of content could be more of an HTML thing as well. Some tags could be mixed and matched to make the page look the same in a browser but be parsed differently by the harvester.
| [reply] |