in reply to Parsing Arabic PDF using in perl
Then try pdftohtml -xml on it.
And please keep in mind that PDFs which don't use a standard font are not necessarily parsable, cause they might embed an own font with random code-points for the glyphs.
In this case deciphering is only possible with character and word recognition. Either automatic (OCR) or human (by populating a hash $glyph{codepoint} for each unknown font)
HTH! (Inshallah =)
Cheers Rolf
( addicted to the Perl Programming Language)
|
|---|