The 'classic' Perlish approach to this type of problem involves bitwise string boolean operations. The string $diff generated by the bitwise-xor of characters in original sequence strings can be used to produce masks that can then be used to extract the differing sub-string sequences from the original strings.
use warnings; use strict; my $s1 = 'ACTGGACGTATGCA'; my $s2 = 'AGTG-ACGC-CGCA'; my $diff = $s1 ^ $s2; my @dpos; push @dpos, [ $-[1], $+[1] - $-[1] ] while $diff =~ m{ ([^\x00]+) }xmsg; print qq{diff at offset $_->[0], length $_->[1] \n} for @dpos; (my $mask = $diff) =~ tr{\x00}{\xff}c; $s1 &= $mask; $s2 &= $mask; my $differences = qr{ [^\x00]+ }xms; @dpos = (); while ($s1 =~ m{ ($differences) }xmsg) { # this code produces same result # my @diff_data = ($1); # $s2 =~ m{ ($differences) }xmsg; # push @diff_data, $1, $-[1]; # push @dpos, \@diff_data; push @dpos, [ $1, do { $s2 =~ m{ ($differences) }xmsg && $1, $-[1] } ] ; } print qq{@$_ \n} for @dpos;
Output:
diff at offset 1, length 1 diff at offset 4, length 1 diff at offset 8, length 3 C G 1 G - 4 TAT C-C 8
See @- and @+ in perlvar, also Bitwise Or and Exclusive Or and Bitwise And in perlop.
BrowserUk is very good on this general topic.
Update: Added better code example, doc links. And thanks to ELISHEVA.
Update: Fixed @- link above. What was I thinking?
In reply to Re: Fast Identification Of String Difference
by AnomalousMonk
in thread Fast Identification Of String Difference
by neversaint
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |