panchuloguay has asked for the wisdom of the Perl Monks concerning the following question:

I am using a perl script to obtain some information from a web page. I use WWW::Mechanize to navigate through the web and finally send the POST request which allows the script to download the CSV file that I want. The code for the POST request and file storage looks like this:

$req = HTTP::Request->new(POST => $POST_URL2);
$req->content_type('application/x-www-form-urlencoded');
$req->content("t_date_from=01.12.2008&t_date_to=30.12.2008&c_date_create=on&d_prog_name=0&d_status=3001&d_accounting_currency=1&d_screen_file=1&b_save=Actualizar+estad%EDsticas");
$req->referer($referer2);
$mech->request($req);


open(OUTFILE, ">$outfile");#the output file is output.csv

print OUTFILE $mech->response->content();

close(OUTFILE);


This part works fine and I get the CSV file. In this file, there is a lot of information that I don't need so I parse it with another script that writes the final result in an excel file (using Spreadsheet::WriteExcel).

My problem is that any information coming from the former CSV file appears in the excel sheet with a wrong format which is not plain text (there are extra white spaces between the letters).


Is there any way to change the format and force the Mechanize to drop the information in a plain text? Am I doing something wrong?

Any help would be appreciated a lot.

Thanks in advance,
MIK

Replies are listed 'Best First'.
Re: wrong encoding using WWW::Mechanize
by ikegami (Patriarch) on Jan 24, 2009 at 19:17 UTC

    Sounds like you are receiving UTF-16 and treating each byte as a character. Try changing

    print OUTFILE $mech->response->content();

    to

    print OUTFILE $mech->response->decoded_content();

    If that doesn't work, you'll need to decode it yourself (probably with decode 'UTF-16', but decode 'UTF-16le' or decode 'UTF-16be' may be necessary).

    If Spreadsheet::WriteExcel doesn't encode it for you, you'll need to encode the data you place in Excel. I don't know which encoding it expects.

      print OUTFILE $mech->response->content();

      to

      print OUTFILE $mech->response->content();

      (Aren't those two statements the same thing?)

      And you didn't even know bears could type.

        hehe oops! Fixed.
Re: wrong encoding using WWW::Mechanize
by Anonymous Monk on Jan 25, 2009 at 06:51 UTC
    Maybe also binmode OUTFILE...
      Hey Guys!!!

      Thanks a lot for all your help!!!

      It was indeed an encoding problem!!

      What really solved the issue was openning in UTF-8 mode:

      my $file = "./cgi-bin/output.txt"; open RAW, '<:encoding(utf-16)', $file or die "Couldn’t open $file: $!\n";
      This converts automatically the utf-16 file to utf-8.

      Thanks everybody for the tips!!