Hello TravelAddict,
I’ve been thinking on and off about this problem, and it occurred to me that you can leverage a module like String::Diff or Algorithm::Diff if you map the words in your sentences onto character tokens and then apply one of the standard string diff methods to the resulting token strings. Here is a proof of concept:
#! perl use strict; use warnings; use Data::Dump; use String::Diff 'diff'; my @pairs = ( [ 'This apple is colored red.', 'This apple is coloured red.', ], [ 'I need to buy a round trip ticket.', 'I need to buy a return ticket.', ], [ 'Jack rode the elevator to the top floor.', 'Jack took the lift to the top floor.', ], ); my %diffs; for my $pair (@pairs) { my (@sent, %ids, @seq); my $id = 33; my ($seq0, $seq1); for my $i (0, 1) { $sent[$i] = $pair->[$i] =~ s/'s?\b//gr; # Remove +possessives $sent[$i] =~ s/[-[\].,;:'"(){}<>]//g; # Remove +all other punctuation for (split /\s+/, $sent[$i]) { $ids{$_} //= $id++; $seq[$i] .= chr($ids{$_}); } } my %lookup = reverse %ids; my ($old, $new) = diff($seq[0], $seq[1]); my @old = $old =~ /\[(.+?)\]/g; my @new = $new =~ /\{(.+?)\}/g; while (@old && @new) { my $o = join(' ', map { $lookup{ord $_} } split(//, shift @old +)); my $n = join(' ', map { $lookup{ord $_} } split(//, shift @new +)); $diffs{$o} = $n; } } dd \%diffs;
Output:
1:42 >perl 1269_SoPW.pl { "colored" => "coloured", "elevator" => "lift", "rode" => "took", "round trip" => "return", } 1:42 >
Hope that helps,
| Athanasius <°(((>< contra mundum | Iustus alius egestas vitae, eros Piratica, |
In reply to Re: Comparing strings to extract the different words/expressions
by Athanasius
in thread Comparing strings to extract the different words/expressions
by TravelAddict
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |