mhd has asked for the wisdom of the Perl Monks concerning the following question:

Hi perl monks,
Currently I need to build tool/script to applying diff but with some rules or customization. I can't just generate diff/patch file and then apply to the source. Actually for now, we do this manually (human intervention). My task now is to make this automatic...and in the end still open visual diff tool for human review. There are 7 C source ( 3 *.c and 4*.h) need to be applied. Not many but it's still hard.

Here's an example if in the ./new/file1 doesnt exist keyword like #ifdef __FOO__ paired with #endif which exist in ./old/file1, then apply that to the file1 in ./merged/file1. Or like #include<XXX> in ./old/file1 that also must be applied and so on.
Another example, there's 1 file contains ridiculous amount of hexa bytecodes which differ here and there between the ./new/file1 and ./old/file1. The rules for this is don't apply any bytecode changes ,IOW, keep it that way. And then one other rule is if in the new/file1 contain new function prototypes which old/file1 don't, keep it that way. I hope you guys now what I mean.
There are more rules actually, not that many, some are just base on obvious keyword, some are more base on pattern and then one special case is must input keyword in the middle. Example:

one line in ./new/file1: abcdefghjiklm 0x1356 one line in ./old/file1: abcdefghijklm BULL((0x1150) one line in new copy of ./merged/file1 must be: abcdefghijklm BULL(0x1356)
Now, imagine the above example exist hundreds time,sparsely. But I think ,maybe, I know the pattern of the BULL appearances in old/file1 cause there are same remark/comment in file1 for each section which is very useful I guess to make this automatic. Maybe wrong though. This latter example is tedious if you do it manually. But Thank God this only happen in one file.

Anyway, I think that's all for my explanation. Hope the monks get this clear. This task is challenging enough for me that now I ran out of idea. If perl can't do it, I dont know any tool can do it...well,except human brain.

My current half-idea is , put the rules of keyword/pattern as an array or maybe hash. and then what ?
Also my previous idea was generate unified diff patch file. And then edit the patch file, output to 2nd customized patch file. Apply the 2nd patch file. In short,I have had trouble editing diff file not to mention 'Patch' complain regarding my edited patch file. And other problem is if in 1 chunk from diff file contains part that need to be applied and need not applied.
BTW,currently my tools are activeperl, Text::Diff pm, and gnuwin32 patch. I hope 'patch' is the only external util to minimize dependency/app requirement.

So, do you guys have any applicable idea? Or is this attempt-to-automatize thing impossible to be done by machine ? Thanks very much in advance

Replies are listed 'Best First'.
Re: Applying diff partially using perl
by roboticus (Chancellor) on Sep 04, 2008 at 12:13 UTC
    mhd:

    I think that if I had to automate that task, I'd generate the diffs automatically, then write a chain of small perl scripts to implement your rules by editing the diff patches. So rather than trying to build a big complicated program, generate your diff patches. Then, as you suggest in your idea: Figure out how to edit the diffs to accomplish your rules.

    While Text::Diff appears to be a nice package, you might want to look at GNU diff, as it has a nice set of options. You can fine-tune its output, and even perform some filtering up-front, such as:

    -I RE --ignore-matching-lines=RE Ignore changes whose lines all match RE.

    For the rule to not edit the bytecodes: I'd scan the file containing the bytecodes for the beginning and ending lines of the bytecode array, and delete diff edits between those lines. (If you have control of the input file, you might add a BEGIN and END type of comment to simplify finding the code blocks you don't want patch to touch, otherwise, you may have to create a set of regexes.)

    To add new prototypes from one file without deleting ones that are removed, simply remove any delete edits in the diff for those particular header files. So for this, you might have a list of files for which you remove all deletes.

    Finally, regarding your question "Or is this attempt-to-automatize thing impossible to be done by machine?": Many times, text is just too free-form to automate everything. But getting a 90% solution is usually quick and easy. Then you need only raise an alert when the program doesn't know what to do for a specific case. I tend to do jobs like this iteratively. As I do a job, I try to find a way to automate either (a) the most annoying case, or (b) the simplest win. Then on the next iteration, I again find something annoying to automate...and so on. In just a few iterations, you'll probably get enough cases covered that you'll find weeks (months...) between alerts.

    Be sure to write your code and/or documentation very clearly! Projects like this are the kind that are a "pick it up and put it down" sort. With infrequent edits to the program, you need to make sure you don't break any rules. I find unit-tests (in the mode of Test Driven Development) to be very helpful here. That way, you can make sure you don't break edits you've already handled.

    I hope this helps...

    ...roboticus
      Roboticus, thanks for your reply...

      I thought gnu diff would become my saviour for 60-70% of the solution. Boy, I was wrong... the -I features is almost useless. CMIIW, but I think gnu diff uses damn primitive POSIX basic regex (BRE). Maybe those gnu programmers thought "ah no one need this, we'll just put this option as a nice-to-have-but-crippled option". Well,guess what? maybe 99.5% don't need. But I'm part of the 0.5%.
      I can't get the regex pattern right for some quite simple text,let alone complex.

      Here my first pattern attempt.

      Full line:
      {(CONST method_info*)0x2f05/*comment*/,0x0}

      True line that I want to match:
      {(CONST method_info*)0x2f05

      The most that I can do using posix BRE:

      -I "^[:space:]*{(CONST method_info\*)0x[:xdigit:]\{1,8\}"
      My re above still not matched. Text still not ignored in diff

      Any improvement suggestion for my re will be highly,highly, highly appreciated. Now, if someone could explain this in ordinary english language cause english isn't my native and my brain is too slow to comprehend the document's meaning.