in reply to Build a PDF book index
> I've noticed that some characters aren't as expected when extracted:
PDF allows to embed it's own fonts, and the encoding of characters is sometimes random then.
You can solve it for a specific PDF document only by scanning the affected font number and manually building a translation table into a hash.
HTH! :)
Cheers Rolf
(addicted to the Perl Programming Language and ☆☆☆☆ :)
Wikisyntax for the Monastery
|
|---|