shotgunefx has asked for the wisdom of the Perl Monks concerning the following question:

This might be slightly off topic as I'm not sure where the source of my problems lie.

I'm trying to split a large PDF into single pages so I can then use Image::Magick to convert them to GIFs. I hit CPAN and immediately find PDF::Extract (how convenient!).

How to split a pdf into multiple pages is right in the synopsis. Pefect!

Here's the problem. While it seems to work, the PDFs aren't readable. I get the following error.

"There was an error opening this document. The file is damaged and could not be repaired."

There are no errors generated while extracting the pages and upon cursory inspection, the markup looks proper. I've tried to view them on several machines/versions of acrobat (and more importantly, process with Image::Magick) with no luck. I tried a different PDF and it works fine. The PDFs I'm trying to process use FlateDecode if that matters.

Anyone have suggestions on how to troubleshoot this? The Acrobat error is pretty much useless in helping determine what the problem is. I've been googling for a PDF validator with no luck as well. If anyone has any suggestions or other opensource alteratives to splitting the files, I'd appreciate it. Thanks, -Lee


-Lee

"To be civilized is to deny one's nature."

Replies are listed 'Best First'.
Re: PDF::Extract problems
by jZed (Prior) on Apr 12, 2005 at 01:49 UTC
    You might try opening the output with a different PDF viewer. I seem to recall some PDF's that would open with Ghostview but not Acrobat (or was it vice versa?). I could convert, open with Ghostview, save, and then that could be opened in Acrobat although the original output couldn't
      I'll give it a shot. Not sure what Image::Magick is using to access PDFs, but it chokes as well.


      -Lee

      "To be civilized is to deny one's nature."
Re: PDF::Extract problems
by ghenry (Vicar) on Apr 12, 2005 at 09:28 UTC

    Try http://www.pdfhacks.com/pdftk/

    It's GPL.

    Walking the road to enlightenment... I found a penguin and a camel on the way.....
    Fancy a yourname@perl.me.uk? Just ask!!!
      I'll have to give it another look. I tried installing it a couple days ago from source and didn't have much luck. Though to be honest, I only spent an hour or so trying to get it to work before seeing what else was available.


      -Lee

      "To be civilized is to deny one's nature."
Re: PDF::Extract problems
by Joost (Canon) on Apr 12, 2005 at 12:49 UTC