How do i do a direct comparison of an array of strings?

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: How do i do a direct comparison of an array of strings? by AnomalousMonk (Archbishop) on Feb 19, 2009 at 19:38 UTC
If the strings/strands being compared are all the same length, bitwise-xoring the strings effectively compares them. The tr operator can then be used to flag and count positions of equality. IIRC, GrandFather had a note on this a few months back. `>perl -wMstrict -le "my ($x, $y) = qw{abXcdYef pqXrsYtu}; my $diff = $x ^ $y; $diff =~ tr{\x00-\xff}{=!}; print $diff; print 'number of differences: ', $diff =~ tr{!}{!}; my @i_diffs = do { my $i = 0; map $_->[1], grep $_->[0] eq '!', map [ $_, $i++ ], split '', $diff; }; print qq{differing positions: @i_diffs}; " !!=!!=!! number of differences: 6 differing positions: 0 1 3 4 6 7` [download] Updates: Found GrandFather's reply: Re: match and mismatch. Changed example to actually count and report positions of differing characters per OPer's request.	[reply] [d/l]
Re^2: How do i do a direct comparison of an array of strings? by AnomalousMonk (Archbishop) on Feb 19, 2009 at 20:27 UTC
Actually, you don't even have to bother with the `tr` stuff if you only want the differing positions: `>perl -wMstrict -le "my ($x, $y) = qw{abXcdYef pqXrsYtu}; my @i_diffs = differing_positions($x, $y); print 'number of differences: ', scalar @i_diffs; print qq{differing positions: @i_diffs}; sub differing_positions { my $i = 0; map $_->[1], grep $_->[0] ne qq{\x00}, map [ $_, $i++ ], split '', $_[0] ^ $_[1] ; } " number of differences: 6 differing positions: 0 1 3 4 6 7` [download]	[reply] [d/l] [select]
Re^3: How do i do a direct comparison of an array of strings? by gone2015 (Deacon) on Feb 20, 2009 at 01:13 UTC
Or even: `sub differing_positions { my @c = unpack('C*', $_[0] ^ $_[1]) ; return grep $c[$_], (0..$#c) ; }` [download]	[reply] [d/l]
Re^4: How do i do a direct comparison of an array of strings? by AnomalousMonk (Archbishop) on Feb 20, 2009 at 11:36 UTC
Re^2: How do i do a direct comparison of an array of strings? by AnomalousMonk (Archbishop) on Feb 20, 2009 at 01:47 UTC
A regex solution is significantly faster than a ST one. Some benchmarks: use warnings; use strict; use Benchmark qw(cmpthese); my $multiplier = 10_000; my $s__ = "abcdefghij" x $multiplier; # base string my $s10 = "Abcdefghij" x $multiplier; # 10% differ my $s50 = "ABCDEfghij" x $multiplier; # 50% differ my $s90 = "ABCDEFGHIj" x $multiplier; # 90% differ sub differ_at_st { my $i = 0; return [ map $_->[1], grep $_->[0] ne qq{\x00}, map [ $_, $i++ ], split '', $_[0] ^ $_[1] ]; } sub differ_at_rx { my $diff = $_[0] ^ $_[1]; my @diffs; push @diffs, $-[1] while $diff =~ m{ ([^\000]) }xmsg; return \@diffs; } print "multiplier == $multiplier \n"; cmpthese(-10, { st_10 => sub { differ_at_st($s__, $s10) }, st_50 => sub { differ_at_st($s__, $s50) }, st_90 => sub { differ_at_st($s__, $s90) }, rx_10 => sub { differ_at_rx($s__, $s10) }, rx_50 => sub { differ_at_rx($s__, $s50) }, rx_90 => sub { differ_at_rx($s__, $s90) }, }); [download] Output: C:\@Work\Perl\monks\745091>perl 745091_1.pl multiplier == 1000 Rate st_90 st_50 st_10 rx_90 rx_50 rx_10 st_90 10.7/s -- -0% -12% -49% -73% -94% st_50 10.7/s 0% -- -11% -49% -73% -94% st_10 12.1/s 13% 13% -- -42% -70% -93% rx_90 20.9/s 95% 95% 73% -- -48% -88% rx_50 40.0/s 274% 273% 230% 91% -- -77% rx_10 177/s 1562% 1555% 1367% 750% 344% -- C:\@Work\Perl\monks\745091>perl 745091_1.pl multiplier == 10000 Rate st_90 st_50 st_10 rx_90 rx_50 rx_10 st_90 1.03/s -- -2% -11% -50% -73% -94% st_50 1.05/s 2% -- -9% -49% -72% -94% st_10 1.15/s 13% 10% -- -44% -69% -93% rx_90 2.05/s 100% 96% 78% -- -45% -88% rx_50 3.76/s 267% 260% 226% 83% -- -77% rx_10 16.6/s 1513% 1483% 1334% 707% 340% -- [download]	[reply] [d/l] [select]
Re: How do i do a direct comparison of an array of strings? by kennethk (Abbot) on Feb 19, 2009 at 15:33 UTC
You could use nested For Loops wrapped around a comparison of two substrings. What have you tried? Do you have any code?	[reply]
Re: How do i do a direct comparison of an array of strings? by dwm042 (Priest) on Feb 19, 2009 at 16:52 UTC
Let me point out that the result of a single comparison is going to be a list of mismatches. If your array of DNA samples is of size N, then the matrix of comparisons ends up being N x N in size (with N*(N-1) unique sets of data, as you can ignore the self comparisons). How you handle storing that is really up to you. A function for handling a single comparison and returning the results as a reference to an array is: `sub dna_mismatches { my $dna_standard = shift; my $dna_to_compare = shift; my @mismatches = ( 0 ); # # assume same length strands # my $last_std = length($dna_standard); for ( 0 .. $last_std - 1 ) { if ( substr( $dna_standard, $_, 1 ) ne substr ( $dna_to_compare, $_, 1 )) { $mismatches[0]++; push @mismatches, $_ + 1; } } return \@mismatches; }` [download]	[reply] [d/l]