Hi,
I need to compare a long list of English US sentences with the same list of sentences in English UK. The goal is to build a dictionary of different words/expressions used, for example "color" vs. "colour".
The difficulty is that there is not always a 1 to 1 correspondence, so one word in US English might be more than one word in UK English, for example "round trip (ticket)" vs. "return (ticket)".
I feel that I'm reinventing the wheel especially that I've seen some diff modules available, however they will compare and flag whole lines instead of just subsets.
Is there a module or an easy way to compare the strings and extract only what's different?
Thanks!