Not that anyone will read this after all this time... I see a way in which transparancy could really fark up an OCR. Using cascading style sheets, layer multiple transparent images directly on top of one another, so that the signal is broken across two or more files. If that's not enough, then you can add plain HTML text as well behind it.

So you have to find multiple images, which can be placed anywhere (dynamically) in the body segment, and also add in some text that you have to parse out with a tokenizer. :-)