Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Error Suppresses Output from CGI Script That Runs Fine from the Command Line

by socrtwo (Sexton)
on Jul 10, 2009 at 03:15 UTC ( [id://778762]=perlquestion: print w/replies, xml ) Need Help??

socrtwo has asked for the wisdom of the Perl Monks concerning the following question:

I am invoking a text extractor with these lines in a Perl/CGI script:
my $doctotextcmd = "no-frills %a %d %f"; `"doctotext.exe --fix-xml --unzip-cmd = $doctotextcmd $success[0] > te +mp.txt"`;
With corrupt docx files, I get no results. The script works fine with un-corrupt docx files. When I run the script from a Windows command line, it also give me an error but runs anyway and produces usable result.

Is there a way to allow the CGI/Perl script to continue in the Web environment despite the error? Below is the command line I'm invoking successfully locally.

Command-prompt>doctotext.exe --fix-xml --unzip-cmd = "no-frills %a %d +%f" intro.docx > temp.txt
Here's the output from the command line:
Executing no-frills.exe intro.docx C:\Users\socrtwo\AppData\Local\Temp +\2 word\do cument.xml >&2 file(s) not foundExecuting rmdir /S /Q C:\Users\socrtwo\AppData\Local\ +Temp\2 Using ODF/OOXML parser. Executing no-frills.exe intro.docx C:\Users\socrtwo\AppData\Local\Temp +\3 word\do cument.xml >&2 file(s) not foundExecuting rmdir /S /Q C:\Users\socrtwo\AppData\Local\ +Temp\3
The script is invoked here. The full relevant section (not any commercial code like I posted last time, sorry!) can be seen here on pastebin. The MS Office converter can be downloaded here. Note I don't think the version of the version of doctotext converter I'm using has been released by the authors to the general public yet, just to me who is sponsoring the development.

++++Update+++

Very oddly, the command line app, DocToText is very sensitive to spaces for some reason. This really didn't work after all for corrupt files, but continued to work for uncorrupt ones:

Command-prompt>doctotext.exe --fix-xml --unzip-cmd = "no-frills %a %d +%f" intro.docx > temp.txt
But this did with the corrupt and un-corrupt files:
Command-prompt>doctotext.exe --fix-xml --unzip-cmd="no-frills %a %d %f +" intro.docx > temp.txt
Thanks for the help!

Replies are listed 'Best First'.
Re: Error Suppresses Output from CGI Script That Runs Fine from the Command Line
by ig (Vicar) on Jul 10, 2009 at 03:31 UTC

    You might try something like:

    my $doctotextcmd = "no-frills %a %d %f"; my $errors = `"doctotext.exe --fix-xml --unzip-cmd = $doctotextcmd $su +ccess[0] 2>&1 > temp.txt"`;

    Then do something appropriate with any errors that appear in $errors.

Re: Error Suppresses Output from CGI Script That Runs Fine from the Command Line
by afoken (Chancellor) on Jul 11, 2009 at 19:20 UTC

    Just two small notes:

    temp.txt will be damaged as soon as two instances of the CGI run at the same time. Use one of the standard temp file name generators. Using IPC::Run or IPC::Open3 would remove the need for a temp file completely.

    docx is nothing but zipped XML. Archive::Zip can unpack that file format, and the various XML classes can parse it. So, you could get rid of the entire subprocess.

    Alexander

    --
    Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://778762]
Approved by graff
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others imbibing at the Monastery: (3)
As of 2024-03-29 01:22 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found