in reply to Re^2: Non-deterministic behaviour with simple array initialization
in thread Non-deterministic behaviour with simple array initialization

It's odd that ld.so.cache would be different on identical systems. That's supposed to be an ordered list of libraries found in the directories listed in ld.so.conf file. If you have additional libraries on one system not found on the other, that would explain that. Having additional libraries doesn't by itself explain the behavior at hand if they're libraries not actually used in the test, though.

The initrd file shouldn't be of any consequence after the system is booted and running, as it's used to provide a temporary root file system before the actual root is ready. Again, I'm not sure why identical images would have different versions of this file, but it shouldn't matter. The file could be different because the images were made separately on hardware with different capabilities. It is possible to leave things running from initrd running after the system is loaded, but I doubt Suse does that on a typical installation.

As for other files, I'm at a loss for the moment. I'll let you know if I think of anything, though.

  • Comment on Re^3: Non-deterministic behaviour with simple array initialization

Replies are listed 'Best First'.
Re^4: Non-deterministic behaviour with simple array initialization
by thkarcher (Novice) on Sep 26, 2008 at 10:38 UTC

    Perhaps I wasn't precise enough: The two VMs are roughly the same, which means, they're running the same kernel on a SuSE 10.3 with the distribution base packages. On each VM are some additional application-specific packages installed, of which some are not found on the other VM. 'ldconfig' walks across all the lib directories, building the cache according to the libs that are lying in there, and because not all libs are installed on both VMs, the resulting cache differs. But I don't think this yields to the problem, do you?

    Some kernel modules differ as well - do you think this could be an issue?

    Thanks,
    Thomas

      As I said before, any difference in the libs that aren't actually in use for the test don't matter.

      The kernel modules could theoretically matter depending on what exactly they do. It's not very likely though unless it's one the code in question actually passes through. A slightly corrupted disk or memory subsystem can cause all kinds of random errors, some of which may be as subtle as this.

      It's easy to assume that things just fail when they are not fully intact, but that's not always the case. I once diagnosed a PATA drive cable as bad over the shoulder of a hardware technician by pointing out to him that the same bits were dropping off of every nth character in the output of a DOS dir command. Other than that sort of issue, simple programs seemed to run just fine on that system. Software corruption can be even more subtle since the problem can be in a rarely used code path.