in reply to Re: Reading PDF (taboo?)
in thread Reading PDF (taboo?)

Indeed

But the PDF I'm working with has two (or more) columns of text, it's sort of a tabloid, and pdf2html makes a big mess out of it.

I've been reading RTFs and G*d knows it's an awful format, so I didn't expect PDF, which I thought an open format, to be so... secretive!

Now I've downloaded PDF::API2, but the documentation's sort of cryptic. I'll have to hack my way through! :)


Thanks for answering.


Mondongo

Replies are listed 'Best First'.
Re: Re: Re: Reading PDF (taboo?)
by drewbie (Chaplain) on May 20, 2004 at 05:16 UTC
    At $dayjob we mostly use htmldoc to create our PDFs. So we build the HTML and then run it through htmldoc to get the PDF. It's not perfect, and you don't get all the control like you can with PDF::API2. But one advantage of this method is that it gives us a web-accessible version for free!

    And if you have an existing PDF you want to add pages to, look at importpage(). In our case, we have an existing report in PDF format that we want to add to a dynamic PDF document. It generally works great (we're using an older release since the latest require 5.8+), although I occasionally find it goes into deep recursion on some file when importing the pages. Haven't figured why, but my workaround is just to do more in htmldoc. :-)

      If you use HTML::Template to build your HTML files, you can use PDF::Template to build your PDF files. :-)

      (Excel::Template can also help round out the trifecta of reporting formats.)

      ------
      We are the carpenters and bricklayers of the Information Age.

      Then there are Damian modules.... *sigh* ... that's not about being less-lazy -- that's about being on some really good drugs -- you know, there is no spoon. - flyingmoose

      I shouldn't have to say this, but any code, unless otherwise stated, is untested