Hello TravelAddict,
I’ve been thinking on and off about this problem, and it occurred to me that you can leverage a module like String::Diff or Algorithm::Diff if you map the words in your sentences onto character tokens and then apply one of the standard string diff methods to the resulting token strings. Here is a proof of concept:
#! perl
use strict;
use warnings;
use Data::Dump;
use String::Diff 'diff';
my @pairs =
(
[
'This apple is colored red.',
'This apple is coloured red.',
],
[
'I need to buy a round trip ticket.',
'I need to buy a return ticket.',
],
[
'Jack rode the elevator to the top floor.',
'Jack took the lift to the top floor.',
],
);
my %diffs;
for my $pair (@pairs)
{
my (@sent, %ids, @seq);
my $id = 33;
my ($seq0, $seq1);
for my $i (0, 1)
{
$sent[$i] = $pair->[$i] =~ s/'s?\b//gr; # Remove
+possessives
$sent[$i] =~ s/[-[\].,;:'"(){}<>]//g; # Remove
+all other punctuation
for (split /\s+/, $sent[$i])
{
$ids{$_} //= $id++;
$seq[$i] .= chr($ids{$_});
}
}
my %lookup = reverse %ids;
my ($old, $new) = diff($seq[0], $seq[1]);
my @old = $old =~ /\[(.+?)\]/g;
my @new = $new =~ /\{(.+?)\}/g;
while (@old && @new)
{
my $o = join(' ', map { $lookup{ord $_} } split(//, shift @old
+));
my $n = join(' ', map { $lookup{ord $_} } split(//, shift @new
+));
$diffs{$o} = $n;
}
}
dd \%diffs;
Output:
1:42 >perl 1269_SoPW.pl
{
"colored" => "coloured",
"elevator" => "lift",
"rode" => "took",
"round trip" => "return",
}
1:42 >
Hope that helps,
|