I appreciate all of the answers, and found that I was able to make some modifications to this one in particular which seems to be yielding most of what I want. I'm still doing a few post-subroutine substitutions to clear up some text formatting issues, but the following subroutine does the bulk of what needed to be done.
sub comparator { my $str1 = shift @_; my $str2 = shift @_; my $original = ''; my $revised = ''; my @from = split(/((?:<[^>]+>)+|(?:\s)+|(?:\w[A-Za-z'-]*\w*)+|(?:\W|\P +{IsWord})|(?:\p{IsDigit}))/, $str1); my @to = split(/((?:<[^>]+>)+|(?:\s)+|(?:\w[A-Za-z'-]*\w*)+|(?:\W|\P +{IsWord})|(?:\p{IsDigit}))/, $str2); my $OS = qq|<span class="m">|; my $OE = qq|</span> |; my $RS = qq|<span class="hl">|; my $RE = qq|</span> |; traverse_sequences( \@from, \@to, { MATCH => sub { my $oldtext = $from[shift()]; $original .= $old +text; $revised .= $oldtext }, DISCARD_A => sub { my $oldtext = $from[shift()]; if ($oldtext =~ m +/(?:\p{IsPunct})|(?:\s)/) {$original .= $oldtext } else { $original . += $OS.$oldtext.$OE } }, DISCARD_B => sub { my $newtext = $to[pop()]; if ($newtext =~ m +/(?:\p{IsPunct})|(?:\s)/) {$revised .= $newtext } else { $revised . += $RS.$newtext.$RE } }, } ); return ($original, $revised); } #END SUB comparator
I have never found the output of a standard diff to be very enlightening. I'm sure it works well to change files, patch-style, but it isn't very readable for someone simply wanting to see what happened to the text in a side-by-side format. This procedure is making a visual inspection much easier, with the help of some HTML markup.
Thank you!
Blessings,
~Polyglot~
In reply to Re^2: Comparing two text files and marking differences
by Polyglot
in thread Comparing two text files and marking differences
by Polyglot
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |