I am invoking a text extractor with these lines in a Perl/CGI script:
my $doctotextcmd = "no-frills %a %d %f";
`"doctotext.exe --fix-xml --unzip-cmd = $doctotextcmd $success[0] > te
+mp.txt"`;
With corrupt docx files, I get no results. The script works fine with un-corrupt docx files. When I run the script from a Windows command line, it also give me an error but runs anyway and produces usable result.
Is there a way to allow the CGI/Perl script to continue in the Web environment despite the error? Below is the command line I'm invoking successfully locally.
Command-prompt>doctotext.exe --fix-xml --unzip-cmd = "no-frills %a %d
+%f" intro.docx > temp.txt
Here's the output from the command line:
Executing no-frills.exe intro.docx C:\Users\socrtwo\AppData\Local\Temp
+\2 word\do
cument.xml >&2
file(s) not foundExecuting rmdir /S /Q C:\Users\socrtwo\AppData\Local\
+Temp\2
Using ODF/OOXML parser.
Executing no-frills.exe intro.docx C:\Users\socrtwo\AppData\Local\Temp
+\3 word\do
cument.xml >&2
file(s) not foundExecuting rmdir /S /Q C:\Users\socrtwo\AppData\Local\
+Temp\3
The script is invoked
here. The full relevant section (not any commercial code like I posted last time, sorry!) can be seen
here on pastebin. The MS Office converter can be downloaded
here. Note I don't think the version of the version of doctotext converter I'm using has been released by the authors to the general public yet, just to me who is sponsoring the development.
++++Update+++
Very oddly, the command line app, DocToText is very sensitive to spaces for some reason. This really didn't work after all for corrupt files, but continued to work for uncorrupt ones:
Command-prompt>doctotext.exe --fix-xml --unzip-cmd = "no-frills %a %d
+%f" intro.docx > temp.txt
But this did with the corrupt and un-corrupt files:
Command-prompt>doctotext.exe --fix-xml --unzip-cmd="no-frills %a %d %f
+" intro.docx > temp.txt
Thanks for the help!