http://qs1969.pair.com?node_id=1220898

Tux has asked for the wisdom of the Perl Monks concerning the following question:

In TPC in Glasgow I released App::ccdiff, which - in short - will more clearly shows horizontal diff as well as vertical diff.

That might look like (with all verbosity on) like this screenshot

```\$ ccdiff -u0m --ascii termc*
5,5c5,5
-
+
+ ^
40,41c40,41
-       :Va=\E[0m:Vc=\E[0;33m:Ve=\E[0;4m:Vg=\E[0;4;36m:\
-                          ^
-       :Vi=\E[0;37;41m:Vk=\E[0;1;33;41m:Vo=\E[0;1;36;41;4m:cQ=\E?25I:
-                    ^             ^  ^                ^
+       :Va=\E[0m:Vc=\E[0;36m:Ve=\E[0;4m:Vg=\E[0;4;36m:\
+                          ^
+       :Vi=\E[0;37;44m:Vk=\E[0;1;37;44m:Vo=\E[0;1;36;44;4m:cQ=\E?25I:
+                    ^             ^  ^                ^

This works fine for the purpose it is written for: find tiny changes with more ease.

It however makes no sense if chunk shows a change of 4 lines to 24 lines with a completely different content, in which case you just want to see the chunk as lines-deleted + lines-added, with no markers to the changed characters in there, as that would mean that almost every character will be marked.

As I currently see it, there are multiple approaches to the fallback of the current behavior to a normal diff report:

• If the number of lines mismatch
If the removed chunk has n lines and the added chunk has n ± x lines and the user can define x, the horizontal diff is invoked, otherwise it will fallback to normal diff-like behavior. A default of 2 seems reasonable.
• If the percentage of changed characters is over x%, where the user can specify x
If the percentage of changed characters in a chunk (all characters marked as removed or added compared to those that did not change) is over x%, fallback to normal diff-like behavior. A default of 40% seems reasonable.

It is possible to implement both and allow both at the same time.

1. Did I state the problem well enough?
2. Do these options make sense?
3. Do the defaults make sense?
4. Do you envision other options (that you would use)?

Before I start coding/changing, I'd like opinions on how you would use it and/or expect it to use, in order to raise DWIM behavior

Enjoy, Have FUN! H.Merijn

Replies are listed 'Best First'.
Re: When not to use subdiff
by TheloniusMonk (Sexton) on Aug 23, 2018 at 07:48 UTC
IMO you need to give priority to the niche you are in rather than worry too much about whether the input falls in your niche. The user can always run a separate old-school diff. But your own output should be rigorously predictable in format, so that people can write code to process it. The only way I see to do that is to have a rigid default behaviour first and have such options as extras. If %change is important in your problem, I would be inclined to have a switch that replaces the functionality with only a statistical analysis that the user can then consider before choosing the next step and that also each such option, not just the default, should stick to the rule of rigorous predictability in the interests of those who will process the output.
Re: When not to use subdiff
by dsheroh (Monsignor) on Aug 25, 2018 at 08:42 UTC
I ran into some surprising output one of the first times I used ccdiff after getting back to work this week:
```-         <ds:KeyName>foo.work.se</ds:KeyName>
-                        ^^^^^^^ ^^^ ^^^^^^^^^
-         <ds:KeyName>splat.work.se</ds:KeyName>
- ^^^^^^^^^^^^^^^^^^^^^^^^
+         <ds:KeyName>foo-test.work.se</ds:KeyName>
+                        ^^
Took me a minute to figure out that it was seeing the diff as "up to 'foo' from the first line, then insert '-t', grab an 'e' and an 's' from later in the first line, and finally take everything starting with 't' on the second line" rather than "insert '-test' on the first line and drop the second line entirely".

Probably another good case for a "percentage of changed characters is over x%" check.

Agree. It stands out way better with -r and colors, but still. I've added the files to my sandbox.

Note that this is still beyond the scope of where I created it for, but I will not ignore this feedback.

Enjoy, Have FUN! H.Merijn

Could you pull from the git repo and try again with -h20. You can find what is your intuitive limit and put heuristics : 20 in ~/.config/ccdiff.

As I got no other suggestions in this thread, I implemented both suggestions.

Enjoy, Have FUN! H.Merijn
With -h20 I get:
```-        <ds:KeyName>foo.work.se</ds:KeyName>
-        <ds:KeyName>splat.work.se</ds:KeyName>
+        <ds:KeyName>foo-test.work.se</ds:KeyName>
```-        <ds:KeyName>foo.work.se</ds:KeyName>