I was fiddling today with something that involved comparing strings and I thought that "Isn't a shame that the inequality operators don't return the point of divergance." I mean I think it would be nice to be able to say:

if (my $pos=$str1 ne $str2) { print "A:$str1\nB:$str2\n"; print "A and B are different at the $pos'th char (index ".($pos-1)." +)\n"; }

Likewise for the other inequality operators.

Also castaway pointed out that if '0E0' is used to represent the divergance at index 0 then the overal return need not have the additional +1. OTOH it would mean the code would look like:

if (my $pos=$str1 gt $str2) { $pos+=0; print "A:$str1\nB:$str2\n"; print "The ".(1+$pos)."'th (index $pos) char in A is 'greater' than +that in B\n"; }

I'm not sure which exceptional behaviour I like more (adding one to the offset or using '0E0' as the return for index 0). But either way would more useful in many situations than the current True/False.

I wonder if this were to be added would it slow Perl down much? I would have thought that most of the extra stuff could be optimised away when the return won't be used... What do you think?

Minor Update: Presumably this behaviour could also extend to the numerical comparison operators by having it return the lower or greater value. So while != wouldn't be covered but the  > <  >= <= operators could return the relevent value so my $is_smaller=$x < 7; would set $is_smaller to $x (or 0E0 if it was 0) for all $x < 7; and it would set it to undef for all $x>=7. In which case obviously castaway's "0E0 for a 0" idea makes more sense than the offset+1, as it works in both the numerical inequality operators as well as the numerical. Although I have a feeling that something here would break perl as it currently works while I don't think thats true of the string comparisons.


---
demerphq

    First they ignore you, then they laugh at you, then they fight you, then you win.
    -- Gandhi

    Flux8


Replies are listed 'Best First'.
Re: Should string inequality operators return the point the of divergance?
by Corion (Patriarch) on Oct 15, 2004 at 09:34 UTC

    I really like this idea, but there is one minor niggle that would mean a big impact on string comparision:

    Two strings can easily be recognized as "non-equal" in the 8-bit non-unicode, non-locale-contaminated world, if their lengths are different. If string comparision is upgraded to always return the first location of divergence, this means that the early-out test for string equality will not work anymore, which would be a shame for the common case of testing for strings being not equal. I'm not sure how Perl tests for equality of unicode strings, respectively if there are unicode code points that are considered equal even if they have differing binary encodings, and the same for locale-infested strings. If Perl does a simple binary comparision of strings, then the same argument holds true there, if the early-out option doesn't hold for unicode strings, then perl would only incur the the hit for plain 8-bit strings - but I guess that's still the majority of Perls string comparisions.

    The simple solution would be available if the string equality operators knew about how their result would be used - in boolean or other context. But I'm not sure if there is such a thing as "boolean context" in Perl5 and if it gets propagated down to the operators at all.

    #The result of the equality check is used in "boolean context" if ($foo ne $bar) { print "Mismatch!"; }; #The result of the equality operator is used in "other context": my $diff_start = ($foo ne $bar); #The result of the equality operator is also used in "other context": if (my $diff_start = ($foo ne $bar)) { print "Mismatch after $diff_start"; };

    Update: An idea floating around the CB (floated by castaway and demerphq) is to store a "lazy" comparision value which would defer the string scan to when the (numerical) value is actually used. I really like this idea but I fear that it will conflict with the special string variables like lvalues and $1, as these get overwritten at the most unfortunate times - and adding an additional reference "just in case" and thus preventing string reuse seems like a bad trade in most cases, as this will mean that a new SV needs to be allocated when an old SV could have been reused...

      If you'll check wantarray, you'll find that there is such a thing as boolean context in Perl 5. I don't know whether operators see it though, but there is no reason that they couldn't.

      UPDATE D'oh. I meant Want, not wantarray. I shouldn't post before I wake up. :-(

Re: Should string inequality operators return the point the of divergance?
by TedYoung (Deacon) on Oct 15, 2004 at 12:49 UTC

    Hi,

    While this is a neat idea, I think Perl 6 has a better return value of ineq ops. They will return something that is a boolean, but also retains the value from one of the operands (I think it is the right). This will allow you to do wonderful things like:

    3 <= $x <= 5

    While reading this, however, I wonder what would be the best way to locate the divergence point in a pair of strings. Here is what I came up with off the top of my head. I guess it would be more efficient to use substr to iterate over and compare each character, but I am too lazy for that.

    #!perl -l sub firstdivergence { my $diff = $_[0] ^ $_[1]; $diff =~ /[^\x00]/g; pos $diff - 1 } sub lastdivergence { my $diff = $_[0] ^ $_[1]; $diff =~ /[^\x00]\x00*$/g; pos $diff - 1 } print firstdivergence 'axcd', 'abced'; # prints 1 print lastdivergence 'axcd', 'abced'; # prints 4

    Ted Young

    ($$<<$$=>$$<=>$$<=$$>>$$) always returns 1. :-)
Re: Should string inequality operators return the point the of divergance?
by hv (Prior) on Oct 15, 2004 at 12:47 UTC

    I doubt perl will be changed to support such behaviour, since it is very rarely required and the information can quite easily be found by other means.

    Here's one way to do it:

    ($s1 ^ $s2) =~ /[^\0]/ and return $-[0];

    Hugo

      Except, of course for

      $s1 = "\0"; $s2 = $s1 x 2; my $diff_pos = ($s1 ^ $s2) =~ /[^\0]/ ? $-[0] : undef; print defined $diff_pos ? "diff_pos says the strings differ at $diff_pos\n" : "diff_pos says the strings are equal.\n"; print $s1 eq $s2 ? "and Perl says they are equal\n" : "and Perl says they are not equal\n";
Re: Should string inequality operators return the point the of divergance?
by castaway (Parson) on Oct 15, 2004 at 09:19 UTC
    Hmm, just occured to me. Why not, instead of returning the position, set the pos() value instead? (The thingy thats used by \G /g to figure out where a regex got to in the input) .. Or is that too much of an 'unexpected side effect'?

    C.

Re: Should string inequality operators return the point the of divergance?
by ambrus (Abbot) on Oct 15, 2004 at 10:11 UTC

    Two comments. First, this would break some existing scripts, which use the numeric value of a comaprision (for example someone might enumerate the negative numbers in a list like this: sub num_negatives { my $n = 0; $n += $_ < 0 for @_; $n; }. Secondly, this might slow down perl as the C library does not have a string comparision function that finds the offset of the first difference, so perl would have to implement it alone.