Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"

Re: When not to use subdiff

by dsheroh (Monsignor)
on Aug 25, 2018 at 08:42 UTC ( #1221078=note: print w/replies, xml ) Need Help??

in reply to When not to use subdiff

I ran into some surprising output one of the first times I used ccdiff after getting back to work this week:
- <ds:KeyName></ds:KeyName> - ^^^^^^^ ^^^ ^^^^^^^^^ - <ds:KeyName></ds:KeyName> - ^^^^^^^^^^^^^^^^^^^^^^^^ + <ds:KeyName></ds:KeyName> + ^^
Took me a minute to figure out that it was seeing the diff as "up to 'foo' from the first line, then insert '-t', grab an 'e' and an 's' from later in the first line, and finally take everything starting with 't' on the second line" rather than "insert '-test' on the first line and drop the second line entirely".

Probably another good case for a "percentage of changed characters is over x%" check.

Replies are listed 'Best First'.
Re^2: When not to use subdiff
by Tux (Canon) on Aug 25, 2018 at 09:10 UTC

    Agree. It stands out way better with -r and colors, but still. I've added the files to my sandbox.

    Note that this is still beyond the scope of where I created it for, but I will not ignore this feedback.

    Enjoy, Have FUN! H.Merijn
Re^2: When not to use subdiff
by Tux (Canon) on Aug 25, 2018 at 12:00 UTC

    Could you pull from the git repo and try again with -h20. You can find what is your intuitive limit and put heuristics : 20 in ~/.config/ccdiff.

    As I got no other suggestions in this thread, I implemented both suggestions.

    Enjoy, Have FUN! H.Merijn
      With -h20 I get:
      - <ds:KeyName></ds:KeyName> - <ds:KeyName></ds:KeyName> + <ds:KeyName></ds:KeyName>
      So I experimented a bit with other heuristic values, trying to find a setting which would give me
      - <ds:KeyName></ds:KeyName> + <ds:KeyName></ds:KeyName> + ^^^^^ - <ds:KeyName></ds:KeyName>
      and found that I get the "classic" diff output for values in the range 2-49, with heuristic values of 1 or 50+ reverting to the original output. Since ccdiff -h describes -h n as "Horizontal char diff treshold"1, I'm guessing that's because the smallest chunks taken in the original output are 1 character, while the complete line (with the real hostname) is 50 characters. Is that a correct description of how the heuristic works or is it just a coincidence?

      1 When I pasted that, my spellcheck caught a typo in "treshold" - it's missing an "h".

        • Typo fixed (thanks)
        • -h1 was an off-by-one error. Also fixed.
        • Pushed
        • Thanks for the feedback

        Enjoy, Have FUN! H.Merijn

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1221078]
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (6)
As of 2023-12-10 07:23 GMT
Find Nodes?
    Voting Booth?
    What's your preferred 'use VERSION' for new CPAN modules in 2023?

    Results (38 votes). Check out past polls.