One could think of presenting the same information in different formats: everytime the next page is presented the lay-out is somewhat different: can be as simple as switching zipcode and city around; or moving the contact address to the front and the link to the picture of the house to the back of the row; or ...

This would certainly annoy any automatic harvesting of your pages.

Or include "invisible" records (e.g. background and foreground color the same and very small point size) with good-looking but bogus information, which would then poison the harverster's database (although this will be bad for text-only browsers).

If you do not have to cater for text-only browsers, one can think of providing the data in XML-format with the tags being given random names for this page only (and of course a different sequence of field-tags within the record tags, with some unused fields tags thrown in for good measure, e.g. two addresses and two phone numbers for each record, one of which will only be rendered) and making a "this page only" XSLT-file which translates client-side the data into HTML. Modern browsers will translate the XML into HTML on the fly, but it will take a fairly sophisticated harvester to make sense of it (or a lot of post-processing the raw data).

CountZero

"If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law


In reply to Re: State-of-the-art in Harvester Blocking by CountZero
in thread State-of-the-art in Harvester Blocking by sgifford

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.