Arik123 has asked for the wisdom of the Perl Monks concerning the following question:
Hi Monks!
I have a PDF file which contains a filled form. Unfortunately the information (text-only) isn't plain ASCII. I nned a perl script to extract the information and process it, but I can't get anything except gibberish. I figured it's condensed somehow, so I used QPDF to make the file more human-readable.
Now there are multiple objects whose content is something like
feff05e405e805d905d8002e002e002ewhich seem to be the content of the fields, in some encoding. There are also some objects that look like:
/BaseFont /RCZMJK+TimesNewRoman /DescendantFonts 13 0 R /Encoding /Identity-H /Subtype /Type0 /ToUnicode 93 0 R /Type /Font
while the /ToUnicode information refes to objects that look like:
93 0 obj << /Length 94 0 R >> stream /CIDInit /ProcSet findresource begin 12 dict begin begincmap /CIDSystemInfo << /Registry (Adobe) /Ordering (UCS) /Supplement 0 >> def /CMapName /Adobe-Identity-UCS def /CMapType 2 def 1 begincodespacerange <0000> <FFFF> endcodespacerange 4 beginbfchar <02A8> <05D8> <02A9> <05D9> <02B4> <05E4> <02B8> <05E8> endbfchar endcmap CMapName currentdict /CMap defineresource pop end end endstream endobj
I need some perl script (or a module) that can make sense of all that (to me it looks like Turkish. Hint: I don't speak Turkish) and convert it to utf-8 or some other encoding that makes sense.
Any help would be appreciated.
|
---|
Replies are listed 'Best First'. | |
---|---|
Re: PDF decoding in Perl
by beech (Parson) on Mar 06, 2017 at 07:25 UTC | |
Re: PDF decoding in Perl
by vr (Curate) on Mar 06, 2017 at 11:24 UTC | |
Re: PDF decoding in Perl
by huck (Prior) on Mar 06, 2017 at 07:35 UTC | |
Re: PDF decoding in Perl
by karlgoethebier (Abbot) on Mar 06, 2017 at 10:36 UTC | |
by thanos1983 (Parson) on Mar 06, 2017 at 10:47 UTC | |
Re: PDF decoding in Perl
by Arik123 (Beadle) on Mar 08, 2017 at 09:42 UTC | |
Re: PDF decoding in Perl
by Arik123 (Beadle) on Mar 06, 2017 at 07:28 UTC | |
by Phenomanan (Monk) on Mar 06, 2017 at 15:59 UTC | |
by beech (Parson) on Mar 06, 2017 at 08:37 UTC |