in reply to Mangling HTML to protect content, and finding stolen HTML content

If the actual HTML source is stolen, some odd comments or affectations would be enough to detect with your own web crawler, and prove it is not original.

With cut&paste from the browser window of a paragraph, it probably loses all of that.

You might also embed a code via steganographic techniques, using only the content that isn't affected by formatting (so, extra spaces are out, etc.). I played around with that here.

—John

  • Comment on Re: Mangling HTML to protect content, and finding stolen HTML content