in reply to Win32 and OCR via OLE

What do you mean OCR MS office formats? As far as I am aware office saves files in each applications native format, .doc for Word, .xls for Excel etc. These are not images so using the term OCR in this context makes no sense. If you want to get the text from them so you can text index them or whatever, thats a different question really.

Regards actually OCRing and text stripping PDFs (and images) see Re: Extracting content text from PDFs in response to Extracting content text from PDFs and remember that super search is your friend.

Hope this helps

Martin