isync has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks,

I have a script that uses the CGI::Application framework and is used to serve utf-8 webpages. Today I added a functionality which reads a binary image file from disk and streams it out to the web user. While incorporating CGI::Application::Plugin::Stream for that I found out that my image was broken unless I switched STDOUT into binary mode just before print()-ing/outputting the raw image data. But why?

I checked every bit of the output-process and finally reached my switches from the perl shebang
#!/usr/bin/perl -COE

So far this set of switches in combination with decoding_utf8() on user inputted data served me quite well. Until today when removing the switches magically solved my problems with the raw data streamout. As it seems, -COE breaks the binary data output with print().

Now, should I keep it this way, without -COE? As said, I generally serve utf-8 webpages with this script and it handles binary file prints as well.

I am aware of the fact that I need to decide wheater I use -COE on the perl command-line and use a binmode, ':raw' on the output or the other way round: I encode all non-raw outputs to :utf8 before output except the raw data. Is there a best practice? The fact that CGI::Application::Plugin::Stream uses no fancy STDOUT commands seems to indicate that it's better to encode all text outputs before print, or it may be because the module is not utf-8 aware. Any suggestions?

Would be great if some knowing monk could outline what combination (-switches, encode, decode) is best to use 1. on the shebang, 2. form-data and 3. on output.

BTW: in CGI::Fast mode: should it switch STDOUT back into what it was before, after my print, or will CGI::Application magically clear the binmode on the next cycle? See:
...read $fh in binmode etc. binmode STDOUT, ':raw'; print $buffer; print ''; close ( $fh ); binmode STDOUT, ':utf8'; <-- :utf8 the equivalent of -CO right? return;

Replies are listed 'Best First'.
Re: perl -COE and, for example, CGI::Application
by karavelov (Monk) on Sep 23, 2008 at 00:13 UTC

    What I usually use for output encoding is just:

    binmode STDOUT, ':utf8';

    If you run in under FastCGI, put this line in cgiapp_prerun. In the mode where you output the binary data insert

    binmode STDOUT,':raw';

    If you expect unicode data from forms, you could put this code in your cgiapp_prerun stage:

    my $vars = $self->query->Vars; while ( my($k,$v) = each %$vars ){ next unless defined $v; next if $self->query->upload($k); # uploads are binary next if $k eq 'auth_password'; # else MD5 crashes; $self->query->param( -name=>$k, -value=>Encode::decode_utf8($v) ); }

    Using applications written with CGI::Application under FastCGI is a little bit troublesome. The recommended in the documentation method of programatically switching runmodes with "prerun_mode" leaves the object further unusable. So the the plugins that use this method (Forward, Redirect, Session, Authentication, Authorization) I find them unusable in fast-cgi environment.

      So, you would advocate for not using any switches on the perl shebang, right?

      Putting binmode STDOUT, ':utf8'; in cgiapp_prerun and doing binmode STDOUT,':raw'; just on runmodes where I output binary data is a very elegant solution, I think, as it resets STDOUT to :utf8 on every new cycle. -Right?

      In regards to "automatic decode_utf8($v) on form input with detection of uploads" I am conservative. Had a bit of trouble with it, thus falling back to doing it on a per runmode basis. Are your experiences more consistent (would this be production code)?

      I can't share your last comment: Might be that I do things less efficient but I have my fastcgi loop far enough around everything, so I can use the normal runmode switching facility and all mentioned plugins work under CGI::Fast.

        I think that it is better to do this with "binmode" but not with switches. One reason for this is that using "binmode" you could run the same cgi-app module under mod-perl.

        I use the decode_utf8() in prerun stage approach in production code. Initially I had some troubles caused by interaction of utf8 and MD5, but I escape this case. Pay attention that "auth_login/auth_passwword" is what I use for authentication, the default in CGI::Application::Plugin::Authentication is different.

        Now I see why you does not have problems with CGI::Fast - because you create new object for every request. My initial experience was with CGI::Application::FastCGI where the run handler is :

        sub run { my $self = shift; my $request = FCGI::Request(); $self->fastcgi($request); while ($request->Accept >= 0) { $self->reset_query; $self->SUPER::run; } }

        In your approach you have "new,run,new,run,new,run...". In my case I have "new,run,run,run...". Actually the difference in performance is not quite big but usually I have some heavy initialization in init stage that is run only once (on new object creation), so I see some benefit in using this approach.

Re: perl -COE and, for example, CGI::Application
by wol (Hermit) on Sep 23, 2008 at 10:44 UTC
    I've not seen -C options before today, but they're all documented here quite nicely: http://perldoc.perl.org/perlrun.html. I might find a use for them myself...

    The option to use UTF-8 for STDOUT breaks down when trying to output anything other than text: The 'U' is for Unicode, which implies text data.

    Image files aren't text (as usual, there are some esoteric exceptions, but in general...) so putting them through a UTF-8 encoder is like running a gamma correction on a perl script.

    In this case, I'd suggest that you have two reasonable options:

    1. Stick with your current approach (default to UTF-8 with the -COE options, and switch to ":raw" when you need to output non-text), or
    2. Remove the -C option completely, and defer selection of ":utf8" or ":raw" until the point at which your code is able to identify what it's going to be outputting.
      Although I am proud of myself for once using the -C options which are new to you ;-) I now have reverted to not using them after the discussion with karavelov. As shown, my updated approach is to use a :utf8 on the start of each script-cycle and in the few cycles my script outputs image/binary data I tell it to switch to :utf8 on a per-case basis. So I am d'accord with what you are concluding!
Re: perl -COE and, for example, CGI::Application
by isync (Hermit) on Sep 23, 2008 at 11:20 UTC
    Slightly going Off-Topic:

    May I point your focus on my code comments in my first response:

    Should I move
    while( my $q = new CGI::Fast ){ $q->header(-charset => 'utf-8'); # as I now set :utf8 in cgiap +p
    the modification of the $q->header to charset utf-8 into CGI::Application's App.pm cgiapp_prerun() stage? Currently I only have it at this very early app.cgi position as I am unclear about when CGI::Application creates the CGI.pm / CGI::Fast object and I am unsure if on-init of this object the charset declaration is taken into account...