cormanaz has asked for the wisdom of the Perl Monks concerning the following question:

Howdy bros. I'm trying to write a script that will open a pdf document and save it as a text file. Apparently, the Acrobat API has no method for "save as" (WTF!).

So I'm wondering if there is some way to use OLE to click the appropriate item on the menu. In other words is there something I can do in the indicated spot here:

use strict; use Win32::OLE; use Win32::OLE::Const "Acrobat"; my $acrobat = Win32::OLE->new("AcroExch.App", "Quit"); my $avdoc = Win32::OLE->new("AcroExch.AVDoc"); $avdoc->Open( "foo.pdf" ); # Open and operate the Save As... dialog? $avdoc->Close();

TIA...Steve

UPDATE Alas, the solution offered below by angiehope, while it works, produces a print image of the page. That is not the same as what you get with Save As on the Acrobat menu, which isolates the text of the document from metadata, sidebar content, etc., and represents paragraphs a distinct strings. So if anyone has an idea how to solve this, I could still use it.

Replies are listed 'Best First'.
Re: Win32::OLE Access Acrobat file menu?
by angiehope (Pilgrim) on Jan 30, 2009 at 13:16 UTC
    Hi!
    How about using CAM::PDF?
    My solution looks like this:
    use strict; use warnings; use CAM::PDF; my($pdfFile,$pdfText,$numpages,$pagecounter,$textfile); $pdfFile = CAM::PDF->new('c:/Downloads/test.pdf'); $numpages = $pdfFile->numPages(); open($textfile,">",'test.txt'); for ($pagecounter = 1;$pagecounter <= $numpages;$pagecounter++) { $pdfText = $pdfFile->getPageText($pagecounter); # check whether the respective page contains text at all if (defined($pdfText)) { print $textfile $pdfText; } } close($textfile);

    Currently, I'm using ActiveState Perl 5.10 on a Windows XP(SP3) Laptop.

    Have a nice day!
      OK that did it. ++ Thanks angiehope!

      BTW if anyone tries using this mod see and gets an error "Unable to open FlateDecode" see this node. I upgraded to Compress-Zlib 2.015 and it did the trick (PPM wouldn't allow uninstall).

      Where did you find that mod? PPM doesn't show it in Activestate, Winnipeg, or Bribes. DOH--scratch that. I was looking at "installed packages." It's there and I will give it a try.
Re: Win32::OLE Access Acrobat file menu?
by Anonymous Monk on Jan 30, 2009 at 13:14 UTC
    try to adapt this excel example
    # Save as PDF $Excel->ActiveWindow->SelectedSheets->PrintOut({ Copy => 1, ActivePrinter => 'Acrobat PDFWriter'}); # Save as Excel $Book->SaveAs({Filename =>'C:\report\results\check_all.xls', FileFormat => xlWorkbookNormal}); $Book->Close(); $Excel->Quit();
    or this vb Save Current PDF File Using OLE in VB Application
    Set AcroExchApp = CreateObject("AcroExch.App") Set AVDoc = AcroExchApp.GetActiveDoc Set PDDoc = AVDocTarget.GetPDDoc If PDDoc.Save(PDSaveFull, "c:\test.pdf") = False Then MsgBox "Unable to save image" Else: MsgBox "The file is saved"
    Might be easier if you used an ole browser to see adobe API
      First example uses Save As method in excel. Second example saves as PDF. Ole browser shows no suitable method. I've checked the Acrobat API docs pretty carefully, and they just don't seem to have the needed method :-(
Re: Win32::OLE Access Acrobat file menu?
by Anonymous Monk on Jan 30, 2009 at 13:19 UTC
    there are some tools availiable to handle pdf docs. I'm not sure how they called something like pdftools. try it out - may be it works better...
Re: Win32::OLE Access Acrobat file menu?
by angiehope (Pilgrim) on Jan 31, 2009 at 14:13 UTC
    Hi!
    Thank you very much for your replies.
    The only other open source pdf processing and text extraction tool that I know would be xpdf
    Hope that helps.