I really like this idea, but there is one minor niggle that would mean a big impact on string comparision:
Two strings can easily be recognized as "non-equal" in the 8-bit non-unicode, non-locale-contaminated world, if their lengths are different. If string comparision is upgraded to always return the first location of divergence, this means that the early-out test for string equality will not work anymore, which would be a shame for the common case of testing for strings being not equal. I'm not sure how Perl tests for equality of unicode strings, respectively if there are unicode code points that are considered equal even if they have differing binary encodings, and the same for locale-infested strings. If Perl does a simple binary comparision of strings, then the same argument holds true there, if the early-out option doesn't hold for unicode strings, then perl would only incur the the hit for plain 8-bit strings - but I guess that's still the majority of Perls string comparisions.
The simple solution would be available if the string equality operators knew about how their result would be used - in boolean or other context. But I'm not sure if there is such a thing as "boolean context" in Perl5 and if it gets propagated down to the operators at all.
#The result of the equality check is used in "boolean context" if ($foo ne $bar) { print "Mismatch!"; }; #The result of the equality operator is used in "other context": my $diff_start = ($foo ne $bar); #The result of the equality operator is also used in "other context": if (my $diff_start = ($foo ne $bar)) { print "Mismatch after $diff_start"; };
Update: An idea floating around the CB (floated by castaway and demerphq) is to store a "lazy" comparision value which would defer the string scan to when the (numerical) value is actually used. I really like this idea but I fear that it will conflict with the special string variables like lvalues and $1, as these get overwritten at the most unfortunate times - and adding an additional reference "just in case" and thus preventing string reuse seems like a bad trade in most cases, as this will mean that a new SV needs to be allocated when an old SV could have been reused...
In reply to Re: Should string inequality operators return the point the of divergance?
by Corion
in thread Should string inequality operators return the point the of divergance?
by demerphq
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |