Hello. I have to similar machines, running the same OS (Debian Sarge) and both use perl from the deb packages. So, everything should work the same... But, I get different results regarding UTF encodings. Running this script on the two machines does not give the same result and I don't know why:
andre@blogs-dev:~$ perl -e 'use Encode; use LWP::Simple; use Data::Dum +per; my $html = get("http://ljsapo.blogspot.com/feeds/posts/full"); u +se XML::Simple; my $parsed = XMLin($html); my $text = $parsed->{"entr +y"}->{"tag:blogger.com,1999:blog-20104370.post-115522283622549946"}-> +{"content"}->{"content"}; print Dumper(Encode::is_utf8($text));' $VAR1 = ''; andre@blogs-dev:~$
andre.cruz@blogs1:~$ perl -e 'use Encode; use LWP::Simple; use Data:: +Dumper; my $html = get("http://ljsapo.blogspot.com/feeds/posts/full") +; use XML::Simple; my $parsed = XMLin($html); my $text = $parsed->{"e +ntry"}->{"tag:blogger.com,1999:blog-20104370.post-115522283622549946" +}->{"content"}->{"content"}; print Dumper(Encode::is_utf8($text));' $VAR1 = '1'; andre.cruz@blogs1:~$
They both use the same Perl package version:
andre@blogs-dev:~$ perl -v This is perl, v5.8.4 built for i386-linux-thread-multi Copyright 1987-2004, Larry Wall Perl may be copied only under the terms of either the Artistic License + or the GNU General Public License, which may be found in the Perl 5 source ki +t. Complete documentation for Perl, including FAQ lists, should be found +on this system using `man perl' or `perldoc perl'. If you have access to + the Internet, point your browser at http://www.perl.com/, the Perl Home Pa +ge. andre@blogs-dev:~$
Also they have the same locale settings:
andre@blogs-dev:~$ locale LANG=POSIX LC_CTYPE="POSIX" LC_NUMERIC="POSIX" LC_TIME="POSIX" LC_COLLATE="POSIX" LC_MONETARY="POSIX" LC_MESSAGES="POSIX" LC_PAPER="POSIX" LC_NAME="POSIX" LC_ADDRESS="POSIX" LC_TELEPHONE="POSIX" LC_MEASUREMENT="POSIX" LC_IDENTIFICATION="POSIX" LC_ALL= andre@blogs-dev:~$
This script just fetches a known feed url and illustrates my problem on one of the XML nodes... The string the XML parser returns has the utf flag ON only on one of the machines... Does anyone know what is the difference between these two machines?

In reply to Encoding differences by EDevil

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.