vahokif has asked for the wisdom of the Perl Monks concerning the following question:

Hi. It's been a while since I used Perl, so this could be totally obvious, in which case I apologize. This is my script:
use strict; use Encode; use Getopt::Long; use HTML::Form; use HTTP::Request::Common; use LWP::UserAgent; use URI::Escape::JavaScript; my $url; my $message = ''; my $scripts = ''; my $username; my $password; my $file; my $name; my $login = 0; my $response; my $usage = "usage: upload.pl url [--message] [--scripts] [--username +--password] [--file [--name]]"; GetOptions('message|m=s' => \$message, 'scripts|s=s' => \$scripts, 'username|u=s' => \$username, 'password|p=s' => \$password, 'file|f=s' => \$file, 'name|a=s' => \$name) or die $usage; if (@ARGV != 1) {die $usage} $url = $ARGV[0]; if ($username && $password) { $login = 1; } elsif ($username || $password) { die 'Username and password must be su +pplied together.' } if ($file) { if (!-e $file) { die "File not found: $file" } } elsif (!$message) { die 'Either a file or a message must be supplied.' + } my $userAgent = LWP::UserAgent->new; $userAgent->agent('PerlKamion'); $userAgent->cookie_jar( {} ); my $latin2 = find_encoding("iso-8859-2"); #login if ($login) { print "Logging in...\n"; $response = $userAgent->post('http://lohere.net/_user/bejelentkezes. +php?login=true', [ user => $username, password => $pass +word ], Content_Type => 'form_data', ); if ($response->filename == "bejelentkezes.php") { $latin2->decode($response->content) =~ m|<font color="red">(.+?) +</font>|; die "Unable to log in: $1"; } } # authentication print "Authenticating...\n"; $response = $userAgent->get($url); $latin2->decode($response->content) =~ m|unescape\('(.+?)'\)|; my @forms = HTML::Form->parse(js_unescape($1), $url); # upload my %uploadForm = $forms[0]->form; $uploadForm{'longcat'} = $scripts; $uploadForm{'is'} = $message; if ($file) { $uploadForm{'looong'} = [ $file, $name ] } print "Posting...\n"; $response = $userAgent->post($url, \%uploadForm, Content_Type => 'form +-data'); my $responseText = $latin2->decode($response->content); print $responseText; if (index($responseText, "Oldal frissítése...") == -1) { $responseText =~ m|<td align="left">(.+?)</td>|; die "Upload error: $1"; } print "Done.";
This works fine, until my $responseText = $latin2->decode($response->content); where decode seems to do nothing at all (the response content is misinterpreted as UTF-8, I think). Any ideas? Also, I'd appreciate it if you could point out things I'm "doing the wrong way". Thanks.

Replies are listed 'Best First'.
Re: Decode stops working
by almut (Canon) on Jun 26, 2010 at 21:12 UTC
    where decode seems to do nothing at all

    What do you expect it to do?  Is your response content encoded in Latin-2?

    the response content is misinterpreted as UTF-8, I think

    What exactly is happening that made you arrive at this conclusion?

    BTW, you have a literal string "Oldal frissítése..." in your code.  Unless you tell Perl otherwise, it would be interpreted as Latin-1. Is that intended?  Also, you don't seem to have specified an output encoding for the print statements, in particular print $responseText. The latter is decoded content, i.e. a string in Perl's internal unicode format, which should be properly encoded for output.

      use utf8; fixed it. Thanks!
      If I print the decoded response text, I get this. In other parts of the script, those characters display correctly.
Re: Decode stops working
by moritz (Cardinal) on Jun 26, 2010 at 21:08 UTC
    This works fine, until my $responseText = $latin2->decode($response->content); where decode seems to do nothing at all

    It would help to know which characters are misinterpreted in what way.

    Before this line, try

    use Data::Dumper; $Data::Dumper::Useqq = 1; print Dumper $response->content

    and past relevant parts of the output, along with your interpretation which characters those should be.

    Perl 6 - links to (nearly) everything that is Perl 6.
Re: Decode stops working
by Khen1950fx (Canon) on Jun 26, 2010 at 21:43 UTC
    There are a few things that I would change. First,
    my $latin2 = find_encoding("iso-8859-2");
    I'd change that to
    my $enc = find_encoding("iso-8859-2")->name;
    That's because I hate to type:). You could do similarly with
    my $userAgent = LWP::USerAgent->new;
    I'd do
    my $ua = LWP::UserAgent->new;
    Then, to debug the script, I'd use something like Debug::STDERR. I put it in the script.
    #!/usr/bin/perl use strict; use warnings; use Encode; use Getopt::Long; use HTML::Form; use HTTP::Request::Common; use LWP::UserAgent; use URI::Escape::JavaScript; BEGIN { $ENV{DEBUG} = 1; } use Debug::STDERR; my $url; my $message = ''; my $scripts = ''; my $username; my $password; my $file; my $name; my $login = 0; my $response; my $usage = "usage: upload.pl url [--message] [--scripts] [--username --password] +[--file [--name]]"; GetOptions( 'message|m=s' => \$message, 'scripts|s=s' => \$scripts, 'username|u=s' => \$username, 'password|p=s' => \$password, 'file|f=s' => \$file, 'name|a=s' => \$name ) or die $usage; if ( @ARGV != 1 ) { die $usage } $url = $ARGV[0]; if ( $username && $password ) { $login = 1; } elsif ( $username || $password ) { die 'Username and password must be supplied together.'; } if ($file) { if ( !-e $file ) { die "File not found: $file" } } elsif ( !$message ) { die 'Either a file or a message must be supplied +.' } my $ua = LWP::UserAgent->new; debug( my_debug => { ua => $ua } ); $ua->agent('PerlKamion'); $ua->cookie_jar( {} ); my $enc = find_encoding("iso-8859-2"); #login if ($login) { print "Logging in...\n"; $response = $ua->post( 'http://lohere.net/_user/bejelentkezes.php?login=true', [ user => $username, password => $password ], Content_Type => 'form_data', ); if ( $response->filename == "bejelentkezes.php" ) { $enc->decode( $response->content ) =~ m|<font color="red">(.+?)</font>|; die "Unable to log in: $1"; } } # authentication print "Authenticating...\n"; $response = $ua->get($url); $enc->decode( $response->content ) =~ m|unescape\('(.+?)'\)|; my @forms = HTML::Form->parse( js_unescape($1), $url ); # upload my %uploadForm = $forms[0]->form; $uploadForm{'longcat'} = $scripts; $uploadForm{'is'} = $message; if ($file) { $uploadForm{'looong'} = [ $file, $name ] } print "Posting...\n"; $response = $ua->post( $url, \%uploadForm, Content_Type => 'form-data' + ); my $responseText = $enc->decode( $response->content ); print $responseText; if ( index( $responseText, "Oldal friss\u00edt\u00e9se..." ) == -1 ) { $responseText =~ m|<td align="left">(.+?)</td>|; die "Upload error: $1"; } print "Done.";
    Update: Fixed encoding error.
      my $enc = find_encoding("iso-8859-2")->name;

      Why ->name?  It would return a string "iso-8859-2", instead of the object needed to be able to call methods like ->decode() upon.

      use Encode; my $enc = find_encoding("iso-8859-2")->name; print $enc; # "iso-8859-2" $enc->decode("foo"); # Can't locate object method "decode" via packag +e "iso-8859-2"
        Good point. Thank you.