Marais has asked for the wisdom of the Perl Monks concerning the following question:

In fixing a new bug recently, I stumbled upon an odd behaviour. When I upload a file via an HTML page, retrieve the file name using CGI, and then call decode_entities on the file name, the result is an empty string. This seems to be Perl version dependent: It fails on Perl v5.20.2 but doesn't fail on v5.10.1. Here is a simple test case:

HTML page:

<DOCTYPE html> <HTML> <BODY> <form action="filename_entity_test.cgi" method="post" enctype="multipa +rt/form-data"> Test file name: <input type="file" name="fn"> <br> <input type="submit" name="submit" value="Upload"> </form> </BODY> </HTML>

CGI code:

#!/usr/bin/perl -w use strict; use CGI qw(:cgi-lib); use CGI::Carp qw(warningsToBrowser fatalsToBrowser); use HTML::Entities; my ($cg, $filename); $cg = new CGI; print $cg->header; print $cg->start_html("Testing decode_entities"); $filename = $cg->param("fn"); print "Filename before call to decode_entities: $filename<br>\n"; decode_entities($filename); print "Filename after call to decode_entities: $filename<br>\n"; print $cg->end_html;

Calling decode_entities with other cgi parameters is successful, and calling it like this:

$filename = decode_entities($filename);
is also successful.

In retrospect, I don't really need to worry about entities in the filename, but I'm very curious as to what is going on here.

Replies are listed 'Best First'.
Re: CGI Filenames and decode_entities
by roboticus (Chancellor) on Jul 23, 2015 at 16:53 UTC

    Marais:

    I'd guess that the decode_entities routine is directly manipulating the @_ array containing the subroutine parameters. If it does that and makes any changes to the string, then it can do what you're describing.

    Here's a cheesy example of what I'm talking about:

    $ cat destructive_sub.pl use warnings; use strict; my $param = 'Now is the time'; print "$param\n"; do_it($param); print "$param\n"; sub do_it { $_[0] =~ s/([it])/uc($1)/ge; } krevulax:~ [roboticus] $ perl destructive_sub.pl Now is the time Now Is The TIme

    Notice how the second print statement gives a slightly different result.

    ...roboticus

    When your only tool is a hammer, all problems look like your thumb.

      And from the HTML::Entities documentation:
      If called in void context the arguments are decoded in-place.
      So the solution is to not call in void context:
      $filename = decode_entities($filename);
      As a side note, if you are migrating or securing an old script, you should take a read through perltaint. It'll help you protect yourself from classic exploits.

      #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

      Well, it's an interesting idea, but why do other cgi parameters (other than the file name, that is) not experience the same problem? What's special about the file name?
Re: CGI Filenames and decode_entities (why)
by Anonymous Monk on Jul 23, 2015 at 22:42 UTC

    Why did you write this use CGI qw(:cgi-lib);??

    Also why would you call decode_entities on $filename?

    I can't imagine a reason you would need to use either of those things, so it kinda doesn't make sense to try to figure out why you're seeing what you're seeing

    It could be as simple as depending on what version of CGI.pm, $filename could be a simple string, or an object

      All I can do is plead ignorance, my lord.

      That line to include the CGI library is just a bad habit from when I first began using Perl.

      As for decoding entities in the file name, I did mention at the start "In retrospect, I don't really need to worry about entities in the filename, but I'm very curious as to what is going on here."

      So, I'm just trying to understand, and by no means trying to demonstrate what a virtuoso Perl programmer I am.

        Have you tried dumping the filename before/after with DD()?  sub DD { scalar Data::Dumper->new( \@_ )->Indent(1)->Useqq(1)->Dump; }