I try to migrate some scripts from one server to another and there are a couple of differences.
The old server has Perl 5.10.1 and locale settings LC_CTYPE="C".
The new server has 5.30.0 and locale settings LC_CTYPE="C.UTF-8".
While I was able to parse an utf8 textfile with open and doing a while on every line like this:
my $category; my $decoded_text = decode('UTF-8', $_); #my $latin1_html = encode('iso-latin-1', $decoded_text ); my $latin1_html = encode('iso-8859-1', $decoded_text ); if ($latin1_html =~ /behälter/i) { $category = 'Behälter'; }
This seems not work in the new environment.
I tried to modify some things and ended up with:
use open IN => ':encoding(UTF-8)'; use open OUT => ':encoding(iso-8859-1)';
before opening the file and
my $category; if ($_=~ /behälter/i) { $category = 'Behälter'; }
This works at first sight but $category makes difficulties and seems to be another encoding than $_.
At the moment I presume $_ is ISO-8859-1 and $category something else. E.g. if I try to insert it into a latin1 mySQL database it throws the error "Incorrect string value: '\xE4".
The perl script file is an ANSI file.
If I run the script with an UTF8 encoded file I can do the following:
my $category; if ($_=~ /beh\xE4lter/i) { $category = 'Behälter'; }

This works but it seems to be more a patch to a symptom than a cure to the problem. I really would like to understand what kind of mistake I am making and what approach I could take to handle file parsing, string modification and storing in files or databases the right way in the new environment. Thank you very much for your comments.


In reply to Perl encoding problem by derion

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.