Sorry to be responding so late on this -- maybe you've already worked out everything I going to say, but I'll say it anyway.
I want to use Encode::from_to(...) to put everything into iso-8859-1 in (probable) good form.
No. If you're expecting to pull in data from various web sites that might use several different single-byte legacy encodings, most of them will not be directly mappable to iso-8859-1. The whole problem with the legacy single-byte encodings is that, to the extent they differ from one another, you cannot map from one to another without losing some characters.

Actually, to the extent that some 8-bit encodings cover fewer displayable characters than others (e.g. iso-8859-* never use 0x80-0x9f for displayable characters, whereas the Windows and Mac code pages always do), loss of information might only happen in one direction. But if your "from" encoding happens to be 8859-2 and your "to" encoding happens to be 8859-1, the conversion simply cannot work.

So, always convert from some non-unicode encoding to utf8. As for guessing correctly from among several 8-bit code pages that cover different latin-alphabet-based languages, the sad truth remains that Encode::Guess will have a hard time getting it right. You need a certain amount of language modeling data (validated by manual inspection and labeling as to language and character set) and some simple statistics on your unknown input data in order to make a proper guess.


In reply to Re: What encoding am I (probably) using? by graff
in thread What encoding am I (probably) using? by tphyahoo

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.