in reply to Re^2: 'bld' project - signature(SHA1) based replacement for 'make'
in thread 'bld' project - signature(SHA1) based replacement for 'make'

I'll reply in detail, however, I want to formulate a considered reply.

Sounds good. Meanwhile, let me "think loud" when and why make can go wrong and how to fix that.

Make (in general, all variants) uses timestamps provided by the operating system and the file system to decide if a rule has to run or not. Rules run only once for each target during a single run of make, so make keeps an internal state, too. This state is obviously initially build from the timestamps.

Timestamps in the future may be caused by system clock manipulation. This happens, for example, when you damage your Slackware system, boot from the install CD/DVD, and chroot to your real system to rebuild the kernel using make. The Slackware installer manipulates the system clock (but not the hardware clock) to work around some problems with time zones, so you virtually travel in time. The same problem may happen when root changes the system time back manually while make runs. GNU make detects both and warns ("File '%s' has modification time %d s in the future"). It could, perhaps should, be more paranoid and abort instead, because it is likely that your build is incomplete.

Clock skew may happen when you use network filesystems (especially NFS) without tightly synchronising system clocks of client and server. The server sets the timestamps, using its system clock, but make on the client compares the timestamps using the client's system clock. That clock may have a very different idea of the current time, it may even jitter around the server's clock, so the freshly generated target may have a timestamp earlier than its source files. Again, it may also happen when root messes with the system clock. GNU make detects this and warns ("Clock skew detected"). Again, it could and perhaps should be more paranoid and abort, because again it is likely that your build is incomplete.

These two problems are the most common problems with make using timestamps, but there are other ways to create wrong timestamps.

FAT filesystems allow only a two-second-resolution of filestamps (you get only even values for seconds. So, your target may have the same timestamp as its source. But this should be no problem, you can get essentially the same problem because stat returns timestamps with only second resolution. Make only rebuilds when the target is older, i.e. timestamp of target is less than timestamp of source. But FAT stores local time, not universal time, so when you change the timezone, the FAT timestamps move back or forward in time.

No problem: The ntp deamon manipulates the system clock so it agrees with the reference clocks, perhaps even while you run make. If ntpd was implemented stupidly, the system clock would wildly jump around, especially if the system clock was off by several seconds or even minutes or hours. But ntpd is smart, it slows down or accelerates the system clock just a little bit at a time, so it smoothly aproaches the reference clocks' time. Generally, systems allow ntpd a single big adjustment of the system time during system boot to compensate cheap battery buffered real-time clocks that tend to run too slow or too fast.

Imagine a system without a battery-buffered realtime clock, like the old home computers or embedded systems. You boot the system, the system clock starts at some arbitary point in time (often with timer count = 0 or build date), and starts counting up, completely independant from any reference clock. No problem until you reboot. "Groundhog Day". Instant "timestamp in the future" problems. If the system has network access, start ntpd (or ntpdate) during boot. If the system is not networked (or just has no access to a reference clock), just make sure the system remembers the last count of its system clock across a reboot. This may be implemented as simply as touching a file once a second (or once a minute) as long as the system runs, and adjusting the system clock to the timestamp of that file during boot. Or, equally, by storing the timestamp in some kind of persistant storage (EEPROM, Flash, battery buffered RAM, ...) every minute or second, or at least in the shutdown script, and reading that value back during boot.

In summary, make sure that the system clock is synchronised with the reference clocks, and keeps counting upwards with no jumps. This will not only help make, but all other programs that rely on timestamps. Most times, the easiest solution is to start ntpd during boot, allowing a single big adjustment during startup of ntpd.

If you run on an isolated network, point to one arbitary machine and declare it holding the reference clock for that network. Serve time via NTP from that machine. Don't mess with its clock or timezone at all. If you have the resources, add a clock signal receiver to that machine (GPS or local radio like WWV, DCF77).

Alexander

--
Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
  • Comment on Re^3: 'bld' project - signature(SHA1) based replacement for 'make'

Replies are listed 'Best First'.
Re^4: 'bld' project - signature(SHA1) based replacement for 'make'
by rahogaboom (Novice) on Nov 15, 2014 at 19:24 UTC
    Motivation:
    Why did I start this project? Over many years I have used make to build many projects. My experiences with setting up
    a makefile system and my observations of already constructed makefile systems caused my realization that a great deal more effort was
    being used in setting up and maintaining these build systems then was necessary. I felt that with the use of one of the modern scripting
    languages - Perl, Python, Ruby - and a simplified design and incorporated automatic header file dependency checking that the whole
    process could be improved. I knew and liked Perl; so I used it. Perl is pretty much everywhere. You don't even need it to be installed
    in the system directories; perlbrew can be used to install to a local user directory. The major design goal was simplicity; complexity
    is the enemy. The existence of many other variations of make indicates to me that others as well were unsatisfied with many aspects of
    make and wanted to provide solutions. These all seemed, however, like band-aids on top of band-aids, never getting at the core of the problem.
    The current state of the project can handle C/C++/Objective C/Objective C++/ASM. I haven't tried to reproduce everything that make does
    in every instance. Simplicity is the goal; the ability to easily build Bld files for a single target or several Bld files for complex
    multi-target projects. The Bld files have a very simple syntax; a comment section - an EVAL section that requires 6 perl variables to be
    defined - a DIRS section with 'R dir:regex:{cmds}' or '{cmds}' specifications. I have succeeded in fully rebuilding the git, svn and
    systemd projects with this design without modifying in any way the directory structure of these projects. At present I have not attempted
    to incorporate Java or any other languages.

    Signatures:
    Signatures are an inherent property, unlike dates, of the file. They provide a simple way of dissociating the criteria for
    rebuilding a file from any other outside interference's - any command line programs that might modify these dates, other parties that are
    involved with the build might modify arbitrarily the dates of files but not the content of the file, clock changes or synchronizations
    that might cause a rebuild without file changes. All of these go away with signatures; a signature change means a file change. The
    suggestion that signatures may collide is a non-starter. Modern signature algorithms are designed to be random with even the smallest
    files changes. With a signature length of 160 bits a collision is unlikely in the extreme.

    The perl standard module Digest::SHA provides sha1_hex() which is fast enough to fully rebuild complex projects like git, svn and
    systemd in reasonable times - execution times are dominated by the recompilation of source, not by signature calculations. The
    problem with make is people time, engineer time. The supposed 'fast' use of dates is overwhelmed by the complexity of any but the
    simplest of Makefiles.

    Signatures are portable; dates are not.

    One of my goals in the use of signatures was security. At each step and for all file types a signature is calculated. Any attacker
    that managed to insinuate a modified source, source build cmd, object, executable or library with an unmodified date would fail. A
    recompilation would result. Protecting the integrity of the build signature file(Bld.sig - the source, objects, executables, build
    command lines, libraries) would be equivalent to protecting the integrity of the build. If you modified two files and three were
    recompiled then you know something in that extra file changed or the object changed or the rebuild command changed. If a project
    rebuild warned/fataled from a sudden unexpected library change then you would have the opportunity to investigate.

    A reply to the numbered items:

    1. Why use signatures is explained above.

    2. The assertion that make is simple to use is astonishing. Look at the GNU tools to automatically generate Makefiles. Why do this if
    Makefiles are so simple. I refer you to the thousands of online articles dealing with the obscurities/version/portability/performance/bugs
    issues related to make. I also refer you to:

    http://www.scons.org/wiki/FromMakeToScons(Adrian Neagu)
    ----a detailed critique of make and some alternatives
    ftp://ftp.gnu.org/old-gnu/Manuals/autoconf/html_mono/autoconf.html#SEC3
    ----a brief critique of make and how GNU automake from the GNU Build System contributes
    http://www.scons.org/architecture/
    ----a description of the scons architecture and in particular the reasons for the use of signatures instead of dates
    http://aegis.sourceforge.net/auug97.pdf
    ----an article "Recursive Make Considered Harmful" by Peter Miller from the Australian UNIX Users Group
    http://www.conifersystems.com/whitepapers/gnu-make/
    ----an in depth critique of make

    I've seen many a Makefile that is so complex as to be unintelligible and that when modified has broken the build and requires a detailed
    reading of the doc for the application of some obscure rule that almost no one knows. The purpose of the GNU tools is to abstract away
    this complexity and yet still have make underneath. Look at the size of the GNU make doc. Randomly point a some section. Ask any
    experienced software engineer about the details of that section; I don't think you'll get far. 'bld' is designed to be simple. The
    learning curve is minimal - a. execute the "Hello, world!" program and read the Bld file(many stubs do nothing routines to illustrate the
    construction of Bld files) b. understand the EVAL and DIRS section requirements - EVAL has defined perl variables(6 are required) and
    DIRS has the 'R dir:regex:{cmds}' specifications c. read bld.README for useful start-up stuff and an intro to bld'ing complex multi-target
    projects(git, svn and systemd) d. do perldoc bld for the full man page. That's it.

    To quote Adrain Neagu(see link above):

    "Difficult debugging

    The inference process of Make may be elegant but its trace and debug features are dating back to the Stone Age. Most clones improved on that.
    Nevertheless, when the Make tool decides to rebuild something contrary to the user expectation, most users will find the time needed to
    understand that behavior not worth the effort (unless it yields an immediate fatal error when running the result of the build, of course).
    From my experience, I noticed that Make-file authors tend to forget the following:

    How rules are preferred when more than one rule can build the same target.
    How to inhibit and when to inhibit the default built in the rules.
    How the scope rules for macro work in general and in their own build set up.

    While not completely impossible, Make-based builds are tedious to track and debug. By way of consequence, Make-file authors will continue to
    spend too much time fixing their mistakes or, under high time pressure, they will just ignore all behavior that they don't understand."

    3. Yes, being non-recursive is an advantage. See the following article "Recursive Make Considered Harmful" by Peter Miller at
    millerp@canb.auug.org.au(http://aegis.sourceforge.net/auug97.pdf) - Australian UNIX Users Group(AUUG).

    bld handles directories with the Bld file 'R dir:regex:{cmds}' specification. Use any number of these specifications. The R indicates
    to apply the 'regex:{cmds}' recursively to sub-directories.

    4. I designed bld to take signatures of source, objects and build cmds all the time. You assert that make can also do this by adding
    dependencies to a target in a Makefile. With bld the programmer does not need(and remember) to do any extra steps.

    5. See above about signatures. With a 160 bit signature collisions are unlikely in the extreme. Also, duplicate files are not necessarily a
    problem. bld warns about them. This give you more information. If two files are the same with the same name then maybe you want a link
    to one from the other. If two files are different names then maybe you might want to change the name of one of them. In most cases I've
    run, this type of warning shows a file with several links to it. More information is better.

    6. Again, by just setting the $opt_lib Bld file variable to "warnlibcheck" you get full library file signature checking. bld does not
    require a dependency entry for each library. And the same as before for source files applies using meta-data dates versus inherent file
    properties like signatures. Lots of stuff can mess with meta-data.

    7. Security is one of the bld goals. I want to take and store(in Bld.sig) the signature of anything that may be tampered with. The
    executable or library might be modified with a date unmodified. Creating a changed target file with the same signature as the unmodified
    target is difficult in the extreme. You mention stripped executables causing a rebuild. Just copy the executable elsewhere and then strip;
    not difficult.

    8. No, you don't need all those files just to compile 'a few files'. If you want to build a single target you only need the Bld file. All
    the other files and directory structure is for building a complex multi-target project that may involve a few to hundreds to thousands of
    files. There are a few provided perl programs to aid you in doing so.

    The Notes comments:

    1. I have no comments on this one. It's not a critique of anything.

    2. The bld restriction that multi-target projects have targets that have unique names and the deposition of all targets into a single directory
    is no real difficulty at all. I designed project builds this way in order to put the targets, the target build adjunct files - bld.info,
    bld.warn and bld.fatal - and the build scripts together so as to see at a glance the status of a build. It works; see the bld versions
    of git, svn and systems. Systemd actually required relocating object files to the directory of the source and in a few cases renaming
    targets to unique names. When the install script is executed multiple targets many then be renamed to the same name in different locations
    if necessary. The fatal files are all listed together and a glance will immediately determine if any target and which targets failed to
    build. Likewise the warn and info files are listed together. I found this flat storage of bld results to show the project status simply
    and immediately with out a lot of cd'ing. I wouldn't normally name different project targets with the same name; this seems counter intuitive.

    3. I think you are confused about the difference between the project source directories and the project results directory. For all three
    projects that I re-built, git, svn and systemd, the downloaded source directory structure remains entirely the same. It's only the results
    files of the target builds that go into a single directory - the targets, the info, warn and fatal files, the project construction
    scripts, the README file, the list of project targets file and specialized scripts required by any of target builds. There are naming
    conventions for everything. There is no need to restructure the source code directory in any way. Please examine the provided source
    for the git, svn and systemd projects rebuilt with bld. They remain entirely unchanged. This is required since running ./configure is
    necessary to generate source based on the system configuration and this source can be deposited anywhere in the source tree.

    4. First, I have only used/tested bld with C/C++/Objective C/Objective C++. I have never tried Java and Perl is not compiled anyway.
    Second, if any bld step generates multiple output files these are moved to the source directory from which came the source file matched for
    building. Nothing is lost. Additionally, the execution of '{commands}' in the Bld file DIRS section can then move these generated files to
    wherever needed. Java might be a future project action.

    Some other problems:

    a. The ability of bld to save the signatures of all source, objects, build cmds, targets and dynamically linked libraries is all that is
    necessary to manage software construction of an active ongoing project. Make most certainly does not do this without substantial difficulty.
    The web is littered with thousands of articles and 'make' how to's on avoiding and fixing make's many problems.

    b. The use of the system() perl call is in the bld code. There are two common ways to execute external cmds in perl; `` - backtics and
    systemd(). I chose systemd(); are you suggesting some other way? I have to execute for each Bld file {} specification the enclosed cmds.
    You don't have to do anything except write a Bld file to build projects. If your objection is to some aspect of the bld code that's one thing
    - I'd be OK to listen - but the comment has nothing to do with using bld.

    c. There are other tools that do not use built in rules - see PBS on cpan.org. The bld Bld file DIRS section 'R dir:regex:{cmds}' construct
    defines where a source is to be found, which source to use and how to manipulate that source to generate the desired output files. The example
    git, svn and systemd projects illustrate how complex projects are built with relatively simple Bld(and Bld.gv) files. Writing:

    R bld.example/example/C/y : ^.*\.c$ : {ls;$CC -c $INCLUDE $s;ls;} # Note: the {} may hold any number of ';' separated cmds

    doesn't seem to me an excessive burden to compile all *.c files recursively from the bld.example/example/C/y directory.

    d. The whole purpose of using Perl was to take advantage of the power the one of the modern scripting languages. Perl is everywhere. Also,
    all of the warnings and fatals in bld give a line number in bld. The user can examine the context in the bld code. Although the source for
    make is available, I have never heard of anyone(user) actually delving into the code for any reason.

    Lastly:
    1. Make and it's difficulties: The entire history of make is one of addons, bandaids, hacked versions and attempts to graft onto an inadequate
    design some feature or other to 'fix' some difficulty. The auto-generation of make files is a good example of attempts to circumvent the entire
    issue by moving the engineers effort upstream - toward an entirely new format specification file - while preserving the downstream Makefile.
    Anyone can read the provided critiques of make and I am sure can find additional criticisms of make online or read the thousands of articles
    and how to's on dealing make's many problems. Signatures are clearly the way to go. They are an inherent property of the file, cheap, portable
    and easy to use and compare. There are no clock synchronization issues. The entire history of make is one complexity and the efforts to deal
    with it.

    2. Try it!: I suspect that, in fact, you have not actually tried bld. Download bld, install the experimental.pm module, cd to the bld directory and
    execute ./bld. That's it! You should now be able to examine the output from the "Hello, World!" program. Look at the bld.info, bld.warn and
    bld.fatal(should be 0) files. This will give you an idea of the output from bld'ing any target - executable or library. The "Hello, World!"
    program has several stub routines that do nothing; they are there to show how a Bld file is constructed. The "Hello, World!" Bld file is well
    commented. Then cd to Bld.example/example. Execute './bld.example --all'. This will build 13 example targets, the source for which is in the
    bld.example/example directory. Download a release file for git, svn or systemd. Install in the bld directory. Do the same stuff e.g. cd
    to Bld.git/git-1.9.rc0 and execute './bld.git --all'. Examine all the built targets and their associated bld files.

    3. Security: The only way to maintain security for the entire bld process is to use signatures for every source, intermediate file, target and
    library. Comparison of saved Bld.sig file signatures against the build source tree will show anything that has been modified. If something
    unexpected was changed the question is why. A TODO item on my list is to write code to do Bld.sig file comparisons with the source tree and to
    write code to do this comparison for multi-target builds.

    4. The Linux kernel: I'd like to re-bld the Linux kernel. I used git, svn and systemd first because these were complex projects with many targets.
    I needed to have a standard directory structure, standard naming conventions and write bld adjunct scripts to manage these complex projects.
    The kernel is a single target, but a really complex one. I wrote extensions.pl(in the aux dir) to list all the various file extensions underneath
    a directory. When run in the kernel main directory you can get an idea of the distribution of file types in the kernel. I haven't done much else
    on kernel work, however. When done, the Bld.sig file for the kernel could then protect the integrity of it's construction; a useful addition.

      rohagaboom:

      Update: Thanks, that's *much* easier to read!

      I've read the dialog between you and afoken and found it interesting. I haven't downloaded the code yet and given it a try, yet, but I'll try to do so soon. I tend to agree with afoken in that make isn't really that difficult to use (though I tend to disable all the built-in rules and just build what I want by hand). But I like the idea of using hashes to detect changes, too, so it might be interesting.

      I can't promise to try it any time soon (as I'm swamped at $work right now), but if I do give it a go, I'll try to update my post to let you know what I think.


      I never thought I'd ask someone to remove code tags from a post, but there's a first time for everything, I guess....

      ...roboticus

      When your only tool is a hammer, all problems look like your thumb.

      bld-1.0.6.tar.gz - changes related to: a. fixes for two gcc warnings in the example code(rdx and daa) b. use 'print STDERR' for all prints - more immediate output c. doc updates