bcrowell2 has asked for the wisdom of the Perl Monks concerning the following question:

O Monks,

This is not strictly a perl question, so maybe a little OT here, but any responses would be much appreciated. I have a perl CGI script that converts certain types of Word documents to plain text: http://tetragonsf.com/cgi-bin/convert_word1.cgi . The CGI at this address has a form with a file upload interface, and when you've done your upload, it hands off that file to a second CGI script, which does the conversion and outputs the plain text. The output code in the second script is basically this:

print <<'HEADER'; Content-type: text/plain HEADER print $txt;

This all works fine, except for one strange thing. If I try to do a "Save Page As" in firefox, it creates the file, but the file only contains a single newline. Anyone have any idea why this would not work properly? Is this something I'm not understanding properly about CGI, or something weird about firefox, or ...?

TIA!

-Ben

Replies are listed 'Best First'.
Re: CGI outputs plain text -- can't "Save Page As?"
by Joost (Canon) on May 28, 2007 at 00:15 UTC

      Hmmm...I'm not completely understanding your post, so I'll just post all my code here. I suppose it might be possible to do this all using a single CGI script, but I found it easier just to implement it as two separate scripts, so I didn't have to detect what state I should be in. The way I'm handing the transaction to the second script is through a form with a file upload in it. Do you get the output on the screen from the second script?

      convert_word1.cgi:

      #!/usr/bin/perl use strict; use CGI; use Digest::SHA1; use URI::Escape; use Tetragon; my $q = new CGI; print Tetragon::html_header(); print <<STUFF; <h2>Converting an MS Word File</h2> <p> This web-based program is designed to convert a fiction manuscript fro +m MS Word format to plain text format. The main advantage of this over a simple "Save a +s" in Word is that it will automatically format italics <tt>_like this_</tt> and bol +dface <tt>*like this*</tt>. Once you upload your file, the resulting convert +ed version will be displayed in your browser, and you can cut and paste i +t into a text editor such as NotePad. </p> <p>Caveats:</p> <ul> <li>This is a fairly computation-intensive thing for my server to do, +so I don't want this to be a service that's used heavily by a huge number of people. Pl +ease don't use it from an automated script, and don't use it more than a few +times per day.</li> <li>Don't assume blindly that the results are what you want. Look over + the output file carefully, and fix any problems by hand.</li> <li>This probably will not work with the OOXML format used as the defa +ult format by the most recent versions of word. To make sure that's not a problem, do a "Save as +" in Word, and choose an older format.</li> </ul> STUFF print <<STUFF; <form method="post" action="http://tetragonsf.com/cgi-bin/convert_word +2.cgi" enctype="multipart/form-data"> <p>file: <input type="file" name="foo" size="40"/></p> <p> <input type="submit" name="Continue" value="Continue" /> < +/p> </form> STUFF print Tetragon::html_footer();

      convert_word2.cgi:

      #!/usr/bin/perl use strict; use CGI; use Digest::SHA1; use Tetragon; use IO::File; use POSIX; use constant UPLOAD_DIR => "/usr/local/www/tetragonsf/uploads"; # not +clear to me that this actually has any effect use constant BUFFER_SIZE => 16_384; use constant MAX_FILE_SIZE => 2_000_000; use constant MAX_DIR_SIZE => 500_000_000; use constant MAX_OPEN_TRIES => 100; $CGI::DISABLE_UPLOADS = 0; $CGI::POST_MAX = MAX_FILE_SIZE; my $q = new CGI; print <<'HEADER'; Content-type: text/plain HEADER my $filename = $q->param('foo'); my $fh = $q->upload('foo'); my $buffer = ''; my $t = ''; binmode $fh; while (read($fh,$buffer,BUFFER_SIZE)) { $t = $t . $buffer; } my $in; do {$in = POSIX::tmpnam()} until ! -e $in; my $out; do {$out = POSIX::tmpnam()} until ! -e $out; open(F,">$in") or die "error opening temporary file for doc"; binmode F; print F $t; close F; system("/home/bcrowell/Documents/programming/scripts/fiction_word_to_t +xt <$in >$out")==0 or die "error executing fiction_word_to_txt"; local $/; # slurp whole file open(F,"<$out") or die "error reading output of fiction_word_to_txt"; my $txt = <F>; close F; unlink $in; unlink $out; print $txt;

        Something weird is indeed going on. It appears that firefox (at least my firefox/iceweasel 2.0.0.3):

        1. does not re-upload a file on a reload (my mozilla seamonkey/iceape 1.0.8 DOES re-upload on reload).

        2. does a new GET request for the result page when you try to save it. (of course, that means you get a blank page - except that you're printing an extra blank line in convert_word2.cgi). (mozilla does the same).

        Here's the script I used to test that. Note that if you outcomment the "Content-Disposition" line, firefox will request you to save the output directly, which fixes the bug, in some sense.

        #!/usr/bin/perl -w use strict; use CGI; my $q = CGI->new; my $method = $q->request_method; my $r = rand(1000000); my $fh = $q->upload("file"); warn "fh is ",($fh ? "" : "not "),"defined"; if ($fh) { #print "Content-Disposition: attachment; filename=output.txt\n"; print "Content-type: text/plain\n"; print "\n"; print "METHOD=$method\n"; my @data =<$fh>; warn "data = '",@data,"'"; print @data; exit; } print <<ENDHTML; Content-type: text/html <html> <body> METHOD=$method <form action="test.cgi?r=$r" method="POST" enctype="multipart/form-dat +a"> <input type="file" name="file"><input type="submit"> </form> </html> ENDHTML
Re: CGI outputs plain text -- can't "Save Page As?"
by naikonta (Curate) on May 28, 2007 at 02:46 UTC
    I find it a bit strange too. Althought forementioned, I was really suprised that when I saved the output from FireFox to a file, that file contains nothing except the newline. Flock browser gave me the same result. However, I managed to save correctly with links (I didn't try it with old good lynx, I don't have it).

    At first I thought it had something to do with the system function so I try to reproduce it with a simple script (system.cgi):

    #!/usr/bin/perl print "Content-type: text/plain\n\n"; system 'uptime';
    but it ran well on FireFox, it displayed the text on the browser, I could save it to a file, and the content is:
    $ cat /tmp/uptime-system.cgi 09:08:55 up 1:10, 4 users, load average: 1.23, 0.80, 0.48
    However, I'm a bit curious to see the appearantly overlapping code below the while {} block in convert_word2.cg. You may try to remove that code that reopen the $out file and let fiction_word_to_txt prints its output directly to the browser so you don't have to redirect it to a temporary file, like:
    system("/home/bcrowell/Documents/programming/scripts/fiction_word_to_t +xt <$in") and die "error executing fiction_word_to_txt: $!\n";
    Another possible option is to convert the doc file on the fly without external program so it works with filehandle. In other words, streaming the file. Is fiction_word_to_txt a Perl program? See also if OLE::Storage can help you.

    Open source softwares? Share and enjoy. Make profit from them if you can. Yet, share and enjoy!

      AFAICT, the situation is that Firefox is trying to reload the page when you do a Save As. I think that would explain the difference in behavior between my script and your system 'uptime' script. Yours will reload (and give a later time), whereas mine won't work when you try to reload it, because it needs the file upload from the first script.

      Fiction_word_to_txt is just a wrapper for wvWare, plus a little bit of extra code for text-munging of wvWare's output.

        the situation is that Firefox is trying to reload the page when you do a Save As
        It sounds making sense. Too bad I can't find any reference regarding this from the net. It's just when loading the uptime script then doing Save As, my access log only shows single transaction log. LiveHTTPHeders also confirms this single transaction.

        Open source softwares? Share and enjoy. Make profit from them if you can. Yet, share and enjoy!