Since you need to do a round trip, it is probably easiest to use ghostscript to convert the pdf to postscript, then do the text filtering on the postscript with perl, and then convert the postscript back into pdf with ghostscript again.

If you have ImageMagick set up properly, you can convert it with:

convert myfile.pdf myfile.ps
and back with:
convert myfile.ps myfile.pdf
You can convert the pdf to xml with pdftohtml using the -xml option, but I don't know how to make the resulting xml back into pdf. Perhaps one of the perl pdf modules would be able to do most of the work.

You can also work with the pdf data directly. The format is nicely documented by Adobe. I have read and written pdf files directly with low-level perl code, but now there are modules that make this much easier.

If you want to learn more about pdf I recommend pdfzone, which includes information about both commercial and open-source tools for working with pdf.

It should work perfectly the first time! - toma

In reply to Re: Editing text in PDF file by toma
in thread Editing text in PDF file by gnum

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.