Compare two strings of same length, character-by-character

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question: ⭐ (strings)

Replies are listed 'Best First'.
Re: Compare two strings of same length, character-by-character by jdporter (Paladin) on Nov 28, 2023 at 22:17 UTC
It's a little crude, but: `my $x = $str1 ^ $str2; $x =~ s/[^\0]+//g; print length($x);` [download]	[reply] [d/l]
Re^2: Compare two strings of same length, character-by-character (feature 'bitwise') by LanX (Saint) on Nov 28, 2023 at 22:38 UTC
See also Bitwise-String-Operators for more background And please note the effects of `use feature 'bitwise'` and especially `use v5.28` (I seemed to remember that `bitwise` deprecated this behavior, but it's rather adding an extra operator `^.` to avoid ambiguities.) Cheers Rolf _{(addicted to the Perl Programming Language :) see Wikisyntax for the Monastery}	[reply] [d/l] [select]
Re^2: Compare two strings of same length, character-by-character by cavac (Prior) on Nov 30, 2023 at 15:04 UTC
Beware that binary comparison of strings does not work on Unicode. Even length() doesn't work. Well, it depends what type of comparison you want. If you want to compare if the strings are exactly the same bit-by-bit, your stuff works. If you need to compare if the characters are functionally equivalent, that's a different matter. As discussed before, many Unicode characters can be encoded in more than one way, see Re^2: incorrect length of strings with diphthongs Further, even if you have one character that's exactly the same Unicode character in both strings, they still might be displayed completely different. Aside from the Umlaut stuff, you have other modifiers as well, like skin tone for Emoji, right-to-left stuff. And also there some scripts like Arab where the display of a character changes depending on the characters around it. And what's extra nice about about modifier characters is that it may or may not modify your debug prints: `my $modifier = chr(0x200F); print "The character $modifier is only in string 1\n";` [download] PerlMonks XP is useless? Not anymore: XPD - Do more with your PerlMonks XP	[reply] [d/l]
Re: Compare two strings of same length, character-by-character⭐ by tybalt89 (Monsignor) on Nov 28, 2023 at 23:22 UTC
`#!/usr/bin/perl use strict; # https://www.perlmonks.org/?node_id=11155923 use warnings; use feature 'bitwise'; my $str1='LFGSLSIIVAHHM'; my $str2='LFGSLSIIVSHHM'; print +($str1 ^. $str2) =~ tr/\0//, " characters are the same\n";` [download] Outputs: `12 characters are the same` [download]	[reply] [d/l] [select]
Re: Compare two strings of same length, character-by-character by kikuchiyo (Hermit) on Nov 29, 2023 at 10:40 UTC
If all you want in the end is a percentage, you may want to look up established string similarity algorithms. See for example the perl module String::Similarity. If you are willing to look outside perl, see e.g. https://github.com/adrg/strutil and references therein.	[reply]
Re: Compare two strings of same length, character-by-character by davido (Cardinal) on Nov 29, 2023 at 23:15 UTC
Another option, that is mostly straightforward... `#!/usr/bin/env perl use strict; use warnings; use List::Util qw(zip); my $str1='LFGSLSIIVAHHM'; my $str2='LFGSLSIIVSHHM'; my ($s1len, $s2len) = map {length} ($str1, $str2); die "Length mismatch: $str1 ($s1len) != $str2 ($s2len).\n" if $s1len != $s2len; my $match = grep {$_->[0] eq $_->[1]} zip [split //, $str1], [split //, $str2]; my $mismatch = $s1len-$match; print "Matches: $match\nMismatches: $mismatch\nPercent match: ", sprintf("%.01f", ($match / $s1len) * 100), "\n";` [download] The output is: `Matches: 12 Mismatches: 1 Percent match: 92.3` [download] This uses List::Util, which is a core Perl module, and aside from the slightly mysterious 'zip', reads like how a person would think through the problem; compare the first letter, compare the second, compare the third, and so on. No real magic that would require a #comment Dave	[reply] [d/l] [select]
Re: Compare two strings of same length, character-by-character by Polyglot (Chaplain) on Nov 29, 2023 at 01:59 UTC
I have done something similar where I have marked the differences in two strings, adding HTML markup to make the differences obvious. You might remove the HTML or do whatever suits your purposes: use Algorithm::Diff qw(traverse_sequences); #--------------------------------------------------------------------- +------------- # C O M P A R A T O R #--------------------------------------------------------------------- +------------- sub comparator { my $str1 = shift @_; my $str2 = shift @_; my $original = ''; my $revised = ''; my @from = split(/([ ,.:;"?!-])/, $str1); my @to = split(/([ ,.:;"?!-])/, $str2); my $OS = q\|<span class="m">\|; my $OE = q\|</span>\|; my $RS = q\|<span class="m">\|; my $RE = q\|</span>\|; traverse_sequences( \@from, \@to, { MATCH => sub { my $oldtext = $from[shift()]; if ($oldtext =~ /<P>/) { $original .= "</p>\n<p>"; $revised .= "</p>\n<p>" } else { $original .= "$oldtext"; $revised .= "$oldtext" } }, DISCARD_A => sub { my $oldtext = $from[shift()]; if ($oldtext =~ /<P>/) { $original .= "</p>\n<p>" } elsif ($original =~ m!</span>\s$!) { if ($oldtext =~ m/(?:[,.:;"?!-])/) { $original =~ s/(?:<\/span>)\s+$/$oldtext$OE/; + } else { $original =~ s/<\/span>\s$/ $oldtext$OE/; } } else { $original .= $OS.$oldtext.$OE } }, DISCARD_B => sub { my $newtext = $to[pop()]; if ($newtext =~ /<P>/) { $revised .= "</p>\n<p>" } elsif ($revised =~ m!</span>\s$!) { if ($newtext =~ m/(?:[,.:;"?!-])/) { $revised =~ s/(?:<\/span>)\s+$/$newtext$RE/; } else { $revised =~ s/<\/span>\s$/ $newtext$RE/; } } else { $revised .= $RS.$newtext.$RE } }, } ); $original =~ s~^(</p>)(.+)$~$2$1~; $revised =~ s~^(</p>)(.+)$~$2$1~; return ($original, $revised); } #END SUB comparator [download] Blessings, ~Polyglot~	[reply] [d/l]
Re^2: Compare two strings of same length, character-by-character by tybalt89 (Monsignor) on Nov 29, 2023 at 21:25 UTC
If all that is needed is a count of how many characters are the same, this can be reduced to: `#!/usr/bin/perl use strict; # https://www.perlmonks.org/?node_id=11155923 use warnings; my $str1='LFGSLSIIVAHHM'; my $str2='LFGSLSIIVSHHM'; use Algorithm::Diff qw(traverse_sequences); my $same = 0; traverse_sequences( [ split //, $str1 ], [ split //, $str2 ], { MATCH => sub { $same++ }, } ); print "$same characters are the same\n";` [download] which outputs: `12 characters are the same` [download]	[reply] [d/l] [select]
Re^2: Compare two strings of same length, character-by-character by Polyglot (Chaplain) on Nov 29, 2023 at 10:06 UTC
I should have added that this is comparing by words based on the split rules given for @from and @to. It could be divided in other ways, I presume, such as by character. As it stands now, without word spaces, the script would highlight the entire string as different if even one character differed in it. Blessings, ~Polyglot~	[reply]
Re^3: Compare two strings of same length, character-by-character by karlgoethebier (Abbot) on Nov 29, 2023 at 11:35 UTC
Why don't you actually outsource your callbacks - as described in the Algorithm::Diff manual? The readability of your code would be significantly improved: `traverse_sequences( \@seq1, \@seq2, { MATCH => $callback_1, DISCARD_A => $callback_2, DISCARD_B => $callback_3, }, undef, # default key-gen $myArgument1, $myArgument2, $myArgument3, ); $callback_1 = sub {�}; $callback_2 = sub {�}; $callback_3 = sub {�};` [download] Instead of: `my $str1 = shift @_; my $str2 = shift @_;` [download] �write: `my ($str1,$str2) = @_;` The HTML thing is completely incomprehensible to me. Leave it out. �The Crux of the Biscuit is the Apostrophe�	[reply] [d/l] [select]