in reply to Re: Comparing two text files and marking differences
in thread Comparing two text files and marking differences

I appreciate all of the answers, and found that I was able to make some modifications to this one in particular which seems to be yielding most of what I want. I'm still doing a few post-subroutine substitutions to clear up some text formatting issues, but the following subroutine does the bulk of what needed to be done.

sub comparator { my $str1 = shift @_; my $str2 = shift @_; my $original = ''; my $revised = ''; my @from = split(/((?:<[^>]+>)+|(?:\s)+|(?:\w[A-Za-z'-]*\w*)+|(?:\W|\P +{IsWord})|(?:\p{IsDigit}))/, $str1); my @to = split(/((?:<[^>]+>)+|(?:\s)+|(?:\w[A-Za-z'-]*\w*)+|(?:\W|\P +{IsWord})|(?:\p{IsDigit}))/, $str2); my $OS = qq|<span class="m">|; my $OE = qq|</span> |; my $RS = qq|<span class="hl">|; my $RE = qq|</span> |; traverse_sequences( \@from, \@to, { MATCH => sub { my $oldtext = $from[shift()]; $original .= $old +text; $revised .= $oldtext }, DISCARD_A => sub { my $oldtext = $from[shift()]; if ($oldtext =~ m +/(?:\p{IsPunct})|(?:\s)/) {$original .= $oldtext } else { $original . += $OS.$oldtext.$OE } }, DISCARD_B => sub { my $newtext = $to[pop()]; if ($newtext =~ m +/(?:\p{IsPunct})|(?:\s)/) {$revised .= $newtext } else { $revised . += $RS.$newtext.$RE } }, } ); return ($original, $revised); } #END SUB comparator

I have never found the output of a standard diff to be very enlightening. I'm sure it works well to change files, patch-style, but it isn't very readable for someone simply wanting to see what happened to the text in a side-by-side format. This procedure is making a visual inspection much easier, with the help of some HTML markup.

Thank you!

Blessings,

~Polyglot~

Replies are listed 'Best First'.
Re^3: Comparing two text files and marking differences
by afoken (Chancellor) on Jan 31, 2021 at 14:35 UTC
    I have never found the output of a standard diff to be very enlightening. I'm sure it works well to change files, patch-style, but it isn't very readable for someone simply wanting to see what happened to the text in a side-by-side format.

    Plain old diff (in the GNU version) has at least four output formats:

    • ed script:
      /tmp>diff foo bar 1,2c1,2 < Bla bla. Foo bar baz. < Nada nada nada. Nada? --- > Bla bar. Foo bar baz. > Nada na-da nada. Nada? 4c4 < bar. Bla. Bar bla. --- > bar. Bla bar bla.
    • Unified:
      /tmp>diff -u foo bar --- foo 2021-01-31 15:13:16.892239748 +0100 +++ bar 2021-01-31 15:13:43.403869518 +0100 @@ -1,6 +1,6 @@ -Bla bla. Foo bar baz. -Nada nada nada. Nada? +Bla bar. Foo bar baz. +Nada na-da nada. Nada? Foo foo foo! Bar. Foo -bar. Bla. Bar bla. +bar. Bla bar bla. Foo bla bla nada bar.
    • Side by side (also available via sdiff)
      /tmp>diff -y foo bar Bla bla. Foo bar baz. | Bla ba +r. Foo bar baz. Nada nada nada. Nada? | Nada n +a-da nada. Nada? Foo foo foo! Bar. Foo Foo fo +o foo! Bar. Foo bar. Bla. Bar bla. | bar. B +la bar bla. Foo bla bla nada bar. Foo bl +a bla nada bar.
    • rcs
      /tmp>diff -n foo bar d1 2 a2 2 Bla bar. Foo bar baz. Nada na-da nada. Nada? d4 1 a4 1 bar. Bla bar bla.

    TortoiseSVN comes with a diff and merge tool called TortoiseMerge that can show changes side by side, highlighting not only changed lines, but also changes within the lines.


    Side note:

    sub comparator { my $str1 = shift @_; #... my $RE = qq|</span> |; traverse_sequences( \@from, \@to, { # ... } ); return ($original, $revised); } #END SUB comparator

    Proper indenting would make the "#END SUB comparator" redundant:

    sub comparator { my $str1 = shift @_; #... my $RE = qq|</span> |; traverse_sequences( \@from, \@to, { # ... } ); return ($original, $revised); }

    Alexander

    --
    Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

      Regarding the proper indenting making my comment "redundant":

      I use indenting as well, but I tend to have subroutines that extend well beyond one screen's worth of code. I like having that note at the bottom just to help guide me in locating my position within the file as I'm scanning. I've developed a habit for doing it this way, and it's not going to change!

      Remember, TMTOWTDI. This is my way.

      Blessings,

      ~Polyglot~