in reply to Re: PDF Parser
in thread PDF Parser

Thx for the tip.

Maybe I can solve one of my open problems this way: reconstruct the text of a book in Yiddish (accented Hebrew), where the accents are added by position. With pdftotext the accents appear at the end of the line.

Replies are listed 'Best First'.
Re^3: PDF Parser
by LanX (Saint) on Mar 18, 2014 at 13:20 UTC
    Well while learning to read Yiddish is on my to-do list, I never thought about doing it via PDF ;)

    The C sources of pdftohtml are pretty compact calls to something like ghostscript (IIRC)¹ so porting it to Perl in order to have tighter control shouldn't be a problem.

    HTH :)

    Cheers Rolf

    ( addicted to the Perl Programming Language)

    update

    nope it's XPDF! :)