comment on

(or possibly 8 Unicode if they were automatically converted for the comparison), but I get 5 non-Unicode and 3 Unicode?? What gives?

No, as far as perl is concerned, you start with 4 Unicode strings and get 8 Unicode strings... in different storage formats. utf8 flag says pretty much nothing about "Unicodeness".

Is that a problem that encode('utf-8', $_) returns what is indistinguishable from "Unicode string" (as people usually understand it)? Yes, it's a problem in practice.

Think about it this way: "1" in perl is struct PV, 1 is struct IV, "1" + 1 is PVIV (if i remember correctly). Now, what would happen if, say, the string concatenation operator was '+' (plus)? How would you determine what $x + $y actually do? What if cmp did the same thing as <=>, ge was just like =? How would you sort numbers?

That's the situation with "Unicode" and "binary" strings in Perl, pretty much. As Ricardo Signes said:

Right now, you can write programs in Perl that handle all this correctly, using only one tool: extreme vigilance. Or, more likely, two tools: vigilance and a debugger.

I personally Devel::Peek instead of debugger :)

Oh, and here's an example of a non-Unicode string:

"\x{FFFF_FFFF_FFFF}"
[download]

(Unicode doesn't have such a big "codepoint")

In reply to Re^3: Mixed Unicode and ANSI string comparisons? by Anonymous Monk
in thread Mixed Unicode and ANSI string comparisons? by BrowserUk

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


"be consistent"
	PerlMonks