in reply to Good examples of code
What about ht://DIG? This thing is quite amazing in terms of what it can do. I think it's written in C, though.
Another is SWISH-E, in a similar vein as ht://DIG.
WordIndex isn't as feature-rich as the above, but for code examples you might find it useful. It extracts text from pdf, .wpd, .doc, and other docs too.
HTH
|
|---|