Thanks, but the guessing method is not partinent in this case, as I was testing it with a fixed set of charsets and the same content before posting. :(
I'm just stumped why encode(), which in this case sets the UTF8 flag on, makes any difference. I can understand if it was necessary to make sure that both the regexp and the content had the UTF8 flag on (or off), but in this case it doesn't matter if the UTF8 flag on the regexp is set or not. It matters that the content has the UTF8 flag off. This is the part that I don't get. I'm not failing to convert the content into the correct encoding.
<ASIDE>I only use Encode::Guess as a final resort because more often than not it tends... to utterly fail to work. It's understandable, though, because there are many who decide to gratuitously mix euc-jp, and shift-jis in the same page, for example. Ugh.
Anyhow, guessing is done using about 20 steps for heuristics at this point. if all else fails, we try to decode using Encode::Guess just to see if we can do it, but we don't really rely on it
In reply to Re^2: Matching UTF8 Regexps
by lestrrat
in thread Matching UTF8 Regexps
by lestrrat
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |