Interesting problem, I had fun playing with this one. Thanks for posting it. I'll say in advance, my results do not match the specified results 100%, specifically my code marks the words "the", "best", and "perl" as moved, since their absolute positions in the two strings is different. If you could elaborate on why those words should not be marked as moved, I can adjust the algorithm when I get home this evening.
use strict; use warnings; use Data::Dumper; my $str1 = 'Perlmonks is the best perl community'; my $str2 = 'Perlmonks is one of the best community of perl users'; if ($str1 eq $str2) { print $str1; exit; } my @wl1 = split /\s+/, $str1; my @wl2 = split /\s+/, $str2; my $wp1 = build_word_hash(\@wl1); my $wp2 = build_word_hash(\@wl2); my $diff_str1 = ''; my $diff_str2 = ''; while (@wl1 || @wl2) { my $word1 = shift @wl1; my $word2 = shift @wl2; if ($word1 && $word2 && $word1 eq $word2) { $diff_str1 .= $word1 . ' '; # pairing the word from the origio +nal string with it's output $diff_str2 .= $word2 . ' '; # lets us do things like case inse +nsitive, but preserving match later shift @{$wp1->{$word1}}; # eat this word shift @{$wp2->{$word2}}; # eat this word next; } #process word1 first, for fun if ($word1) { if ($wp2->{$word1} && @{$wp2->{$word1}} && ! grep {$_ == $wp2->{$word1}->[0]} @{$wp1->{$word1}} ) +{ # word moved. # the grep checks that the next occurance of the word in s +tring 2 ($wp2->{$word}->[0] # does not also have an occurance of the word in string 1. # if it does not, it means that this is a move of the word +. $diff_str1 .= "[$word1] "; shift @{$wp2->{$word1}}; # eat this word } else { # Easy case, word in string 1 but not string 2 $diff_str1 .= "<$word1> "; } } if ($word2) { if ($wp1->{$word2} && @{$wp1->{$word2}} && ! grep {$_ == $wp1 +->{$word2}->[0]} @{$wp2->{$word2}} ) { $diff_str2 .= "[$word2] "; shift @{$wp1->{$word2}}; # eat this word } else { $diff_str2 .= "<$word2> "; } } } print "$diff_str1\n$diff_str2\n"; sub build_word_hash { my $wl = shift; my $res = {}; my $i = 0; foreach my $word( @$wl ) { push @{$res->{$word}} , $i++; } return $res; }
results:
Perlmonks is [the] [best] [perl] [community] Perlmonks is <one> <of> [the] [best] [community] <of> [perl] <users>
Sorry for sloppy code, I was writeing this on my lunch break.

In reply to Re: diff of two strings by chaos_cat
in thread diff of two strings by flaviusm

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.