Then try pdftohtml -xml on it.
And please keep in mind that PDFs which don't use a standard font are not necessarily parsable, cause they might embed an own font with random code-points for the glyphs.
In this case deciphering is only possible with character and word recognition. Either automatic (OCR) or human (by populating a hash $glyph{codepoint} for each unknown font)
HTH! (Inshallah =)
Cheers Rolf
( addicted to the Perl Programming Language)
In reply to Re: Parsing Arabic PDF using in perl
by LanX
in thread Parsing Arabic PDF using in perl
by fattahsafa
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |