Thanks, but the guessing method is not partinent in this case, as I was testing it with a fixed set of charsets and the same content before posting. :(

I'm just stumped why encode(), which in this case sets the UTF8 flag on, makes any difference. I can understand if it was necessary to make sure that both the regexp and the content had the UTF8 flag on (or off), but in this case it doesn't matter if the UTF8 flag on the regexp is set or not. It matters that the content has the UTF8 flag off. This is the part that I don't get. I'm not failing to convert the content into the correct encoding.

<ASIDE>I only use Encode::Guess as a final resort because more often than not it tends... to utterly fail to work. It's understandable, though, because there are many who decide to gratuitously mix euc-jp, and shift-jis in the same page, for example. Ugh.

Anyhow, guessing is done using about 20 steps for heuristics at this point. if all else fails, we try to decode using Encode::Guess just to see if we can do it, but we don't really rely on it


In reply to Re^2: Matching UTF8 Regexps by lestrrat
in thread Matching UTF8 Regexps by lestrrat

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.