Re: How do i do a direct comparison of an array of strings?

If the strings/strands being compared are all the same length, bitwise-xoring the strings effectively compares them. The tr operator can then be used to flag and count positions of equality. IIRC, GrandFather had a note on this a few months back.

>perl -wMstrict -le
"my ($x, $y) = qw{abXcdYef pqXrsYtu};
 my $diff = $x ^ $y;
 $diff =~ tr{\x00-\xff}{=!};
 print $diff;
 print 'number of differences: ', $diff =~ tr{!}{!};
 my @i_diffs = do {
   my $i = 0;
   map   $_->[1],
   grep  $_->[0] eq '!',
   map   [ $_, $i++ ],
   split '',
   $diff;
   };
 print qq{differing positions: @i_diffs};
"
!!=!!=!!
number of differences: 6
differing positions: 0 1 3 4 6 7
[download]

Updates:

Found GrandFather's reply: Re: match and mismatch.
Changed example to actually count and report positions of differing characters per OPer's request.

Comment on Re: How do i do a direct comparison of an array of strings? Download Code

Replies are listed 'Best First'.
Re^2: How do i do a direct comparison of an array of strings? by AnomalousMonk (Archbishop) on Feb 19, 2009 at 20:27 UTC
Actually, you don't even have to bother with the `tr` stuff if you only want the differing positions: `>perl -wMstrict -le "my ($x, $y) = qw{abXcdYef pqXrsYtu}; my @i_diffs = differing_positions($x, $y); print 'number of differences: ', scalar @i_diffs; print qq{differing positions: @i_diffs}; sub differing_positions { my $i = 0; map $_->[1], grep $_->[0] ne qq{\x00}, map [ $_, $i++ ], split '', $_[0] ^ $_[1] ; } " number of differences: 6 differing positions: 0 1 3 4 6 7` [download]	[reply] [d/l] [select]
Re^3: How do i do a direct comparison of an array of strings? by gone2015 (Deacon) on Feb 20, 2009 at 01:13 UTC
Or even: `sub differing_positions { my @c = unpack('C*', $_[0] ^ $_[1]) ; return grep $c[$_], (0..$#c) ; }` [download]	[reply] [d/l]
Re^4: How do i do a direct comparison of an array of strings? by AnomalousMonk (Archbishop) on Feb 20, 2009 at 11:36 UTC
The `unpack` solution is simpler, IMHO, than the regex solution (see Re^2: How do i do a direct comparison of an array of strings? below), and is also faster except for strings in which there are relatively few differences. Benchmarks: Read more... (2 kB)	[reply] [d/l] [select]
Re^2: How do i do a direct comparison of an array of strings? by AnomalousMonk (Archbishop) on Feb 20, 2009 at 01:47 UTC
A regex solution is significantly faster than a ST one. Some benchmarks: use warnings; use strict; use Benchmark qw(cmpthese); my $multiplier = 10_000; my $s__ = "abcdefghij" x $multiplier; # base string my $s10 = "Abcdefghij" x $multiplier; # 10% differ my $s50 = "ABCDEfghij" x $multiplier; # 50% differ my $s90 = "ABCDEFGHIj" x $multiplier; # 90% differ sub differ_at_st { my $i = 0; return [ map $_->[1], grep $_->[0] ne qq{\x00}, map [ $_, $i++ ], split '', $_[0] ^ $_[1] ]; } sub differ_at_rx { my $diff = $_[0] ^ $_[1]; my @diffs; push @diffs, $-[1] while $diff =~ m{ ([^\000]) }xmsg; return \@diffs; } print "multiplier == $multiplier \n"; cmpthese(-10, { st_10 => sub { differ_at_st($s__, $s10) }, st_50 => sub { differ_at_st($s__, $s50) }, st_90 => sub { differ_at_st($s__, $s90) }, rx_10 => sub { differ_at_rx($s__, $s10) }, rx_50 => sub { differ_at_rx($s__, $s50) }, rx_90 => sub { differ_at_rx($s__, $s90) }, }); [download] Output: C:\@Work\Perl\monks\745091>perl 745091_1.pl multiplier == 1000 Rate st_90 st_50 st_10 rx_90 rx_50 rx_10 st_90 10.7/s -- -0% -12% -49% -73% -94% st_50 10.7/s 0% -- -11% -49% -73% -94% st_10 12.1/s 13% 13% -- -42% -70% -93% rx_90 20.9/s 95% 95% 73% -- -48% -88% rx_50 40.0/s 274% 273% 230% 91% -- -77% rx_10 177/s 1562% 1555% 1367% 750% 344% -- C:\@Work\Perl\monks\745091>perl 745091_1.pl multiplier == 10000 Rate st_90 st_50 st_10 rx_90 rx_50 rx_10 st_90 1.03/s -- -2% -11% -50% -73% -94% st_50 1.05/s 2% -- -9% -49% -72% -94% st_10 1.15/s 13% 10% -- -44% -69% -93% rx_90 2.05/s 100% 96% 78% -- -45% -88% rx_50 3.76/s 267% 260% 226% 83% -- -77% rx_10 16.6/s 1513% 1483% 1334% 707% 340% -- [download]	[reply] [d/l] [select]