Re: Build a PDF book index

Tl;dr, but

> I've noticed that some characters aren't as expected when extracted:

PDF allows to embed it's own fonts, and the encoding of characters is sometimes random then.

You can solve it for a specific PDF document only by scanning the affected font number and manually building a translation table into a hash.

HTH! :)

Cheers Rolf
_{(addicted to the Perl Programming Language and ☆☆☆☆ :)

Wikisyntax for the Monastery}

Comment on Re: Build a PDF book index