in reply to Re^2: HoA create from array using map, more efficient?
in thread HoA create from array using map, more efficient?

(disk corruption, interrupted network connection, ...).

Sorry, but that really is paranoia.

Firstly, if your network protocol & handling doesn't detect interrupted network connections long before you start running regex against the corrupted or truncated data that comes from it, then you are either using the wrong protocol, or skipping good practice on your handling.

As for disk corruption, -- which does happen -- if your data is important enough then you'll be using raided disks that will detect and correct errors (and flag the corrupted volume early and loudly).

Attempting to programming every statement to try and detect the possibility of hardware failure is a futile exercise that at best costs dear for no benefit, and at worst can be the cause of project cancellation.

Proof by reductio ad absurdum: If you are going down that route, then you would also have to check for the possibility of memory failure -- I had a 2GB ram module fail only a couple of weeks ago.

So what could you do? How can you be sure that when you read a value back from a variable that you get the same value that you stored? Perhaps you store every value in two different variables and then read them both and compare them. But what do you do if they are different? Is it the original value that was corrupted? Or the backup?

No way to tell, so now you have to store everything thrice and do a 3-way compare each time you use a variables value and go for the consensus. But what if it isn't the memory holding one of your three copies of the variable that gets corrupted, but the ram that holds the result of the comparison?

So now you need to have two separate routines that each do the 3-way compare to ensure that they use different memory locations for the result. But when one of the results is corrupted, you don't know which one is the good one, so now you need three routines doing the 3-way compare and then compare the three results. And you need to do this for every variable and every access to every variable.

Ah! But then the memory that holds the results of the comparisons of the results could be the ram location that has a drop out ...

Or, you could just use EEC ram chips!

There is an appropriate place and mechanism for detecting hardware corruption and failure. And "defensive programming" of every line of code is not that place or mechanism.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
  • Comment on Re^3: HoA create from array using map, more efficient?

Replies are listed 'Best First'.
Re^4: HoA create from array using map, more efficient?
by hsinclai (Deacon) on Jun 18, 2011 at 20:13 UTC
    >>>(disk corruption, interrupted network connection, ...). >>Sorry, but that really is paranoia.

    Not completely, I'm afraid, BrowserUK. At least, not with respect to Amazon: where were you during the EBS crash on the US east coast that lasted for days, destroyed data, terminated people's web platforms, interrupted business, and made all major news headlines?

    In fact that April crash is the reason I'm writing up some automation of snap/volumes backup rotation (back to my OP topic), to get safe copies of data into alternate AMZ zones. Not every environment, as you may suggest in the other postings is fully locked down/controllable - dev/ops team has several members who have commit into the script repo and have the right/need to run scripts across that platform to fill certain needs - yes it would be "error" if they made an error but the possibility of them doing something wrong is very real - and so is the possiblity of ec2 (not a fully controlled enviroment unless you believe the marketing) blowing up!!

    >>And "defensive programming" of every line of code is not that place or mechanism.

    I take your point and you are correct in saying that there have to be rules around use of programs in cases like these, which I will stress fully!

    -Harold

      Not completely, I'm afraid, BrowserUK. At least, not with respect to Amazon:

      Did you read the rest of the post? I did acknowledge that corruption does occur. My point was not that it doesn't, but that trying to program defensively against the possibility is futile.

      For example, in each of your 10-digit timestamps, there are 576,650,390,625 combinations of bit failures that would not only cause your regex to fail to match, but that would corrupt the timestamp such that it would cause you to either discard a backup early, or retain a backup that should have been discarded. That's (at least) five hundred and seventy trillion failure modes that your regex would not and could not detect!

      Let me explain. For any decimal digit there are 15 single bit failures that could morph one digit into another valid decimal digit. Eg. 0x31 ascii('1') with corruption of bit-2 on, becomes 0x33 ascii('3'). Or with bit-3 corrupted on: 0x35 ascii('5'); and bit-4 ascii('9'). Do that for all digits and all bits and then combine them out and you get 576 trillion possible combinations that the regex will not detect but that could cause extensive problems through your discarding your latest snapshots or retaining old ones when you shouldn't. And that's just the single bit per digit combinations.

      And the possibility of single bit corruptions occurring is far more likely than the disappearance of all 10 digits that it would require before your regex would fail.

      You cannot hope to detect, much less deal with, these kinds of failures through defensive programming at the statement level in all your scripts. It can only be done through a combination of error detecting hardware and file-systems. Programming to detect just some of the possible errors, especially when that subset are the least likely to occur, is pointless and naive.

      Money down the drain for the sake of a false sense of security. Extra complexity and extra runtime (more costs) for no possible benefit.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        I want to make sure I understand what you're saying here:
        ... For any decimal digit there are 15 single bit failures that could morph one digit into another valid decimal digit... And the possibility of single bit corruptions occurring is far more likely than the disappearance of all 10 digits that it would require before your regex would fail.

        I don't think it's just a matter of "the disappearance of all 10 digits" -- it's any disruption of the expected 10-digit string (in combination with a preceding name string), such as might happen if, say, one or more processes are writing both stdout and stderr to a single file handle, asynchronously, so that warning messages might interrupt in the middle of normal report lines, rather than being neatly interleaved. (I've seen that happen -- it's ugly.)

        Comparing the variety of corruptions of that sort to the limited set of bit inversions you speak of, that turn, e.g., a "5" (0x35) into a "7" or "4" or "1" (but excluding what should be equally likely corruptions that turn "5" into 'NAK' (0x15) or '%' (0x25) or "u" (0x75) or any of the non-ASCII outcomes with high bit set), how do you figure that those particular bit inversions are "far more likely"?

        Having just read your other reply to me below, I gather that my "twisted stderr/stdout" example is not part of the OP's scenario -- unless the poster is doing something different from what you assumed... Anyway, thanks for that -- your assessment is helpful, as usual.

Re^4: HoA create from array using map, more efficient?
by graff (Chancellor) on Jun 19, 2011 at 23:12 UTC
    I agree, one can always reach a point at which "sanity checking" exceeds "due diligence" and leads to insanity. But to get back to the case at hand, a relevant question for this thread is whether my addition to your map block (with the extra "nfg()" function call) shows a degree of error-checking zeal that costs more than it's worth.

    I believe that depends on what is generating/transporting the input to the script, and how much the application cares about out-of-band behavior in the input. I've certainly seen situations where the extra work is needed, and others where it isn't. I tend to err more often on the side of applying some extra error checking and logging when it isn't needed, and less often the other way, because the latter is more troublesome when it happens.

      I tend to err more often on the side of applying some extra error checking and logging when it isn't needed,

      This is going to sound supercilious, and there is nothing I can do about it.

      Don't err .. find out.

      The source of the OPs data is most likely this API. And the source of that API is whatever filesystem + metadata that underlies the Amazon EBS filesystem. Which AFAIK means, that unless you are party to some inside information, your (the OPs) only choice is to trust the results that API returns are uncorrupted; because you (he) simply do not have enough information to determine otherwise.

      So in this case, erring on the side of caution isn't "a sensible precaution", nor "being proactive"; nor "good methodology"; it is a complete waste of either your employer's or client's time, resource and money. And yours. Performing a test, because it is easy to do, despite the fact that you know it will only detect some minuscule percentage of the possible failures, all 3-sigma outliers to boot, serves only to slow things down for no statistically likely benefit.

      It is supplementing a belt made of kevlar, with braces made of overcooked spaghetti. Or betting an accumulator on two successive lotteries.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.