Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Re^4: HoA create from array using map, more efficient?

by hsinclai (Deacon)
on Jun 18, 2011 at 20:13 UTC ( [id://910355]=note: print w/replies, xml ) Need Help??


in reply to Re^3: HoA create from array using map, more efficient?
in thread HoA create from array using map, more efficient?

>>>(disk corruption, interrupted network connection, ...). >>Sorry, but that really is paranoia.

Not completely, I'm afraid, BrowserUK. At least, not with respect to Amazon: where were you during the EBS crash on the US east coast that lasted for days, destroyed data, terminated people's web platforms, interrupted business, and made all major news headlines?

In fact that April crash is the reason I'm writing up some automation of snap/volumes backup rotation (back to my OP topic), to get safe copies of data into alternate AMZ zones. Not every environment, as you may suggest in the other postings is fully locked down/controllable - dev/ops team has several members who have commit into the script repo and have the right/need to run scripts across that platform to fill certain needs - yes it would be "error" if they made an error but the possibility of them doing something wrong is very real - and so is the possiblity of ec2 (not a fully controlled enviroment unless you believe the marketing) blowing up!!

>>And "defensive programming" of every line of code is not that place or mechanism.

I take your point and you are correct in saying that there have to be rules around use of programs in cases like these, which I will stress fully!

-Harold

Replies are listed 'Best First'.
Re^5: HoA create from array using map, more efficient?
by BrowserUk (Patriarch) on Jun 18, 2011 at 21:13 UTC
    Not completely, I'm afraid, BrowserUK. At least, not with respect to Amazon:

    Did you read the rest of the post? I did acknowledge that corruption does occur. My point was not that it doesn't, but that trying to program defensively against the possibility is futile.

    For example, in each of your 10-digit timestamps, there are 576,650,390,625 combinations of bit failures that would not only cause your regex to fail to match, but that would corrupt the timestamp such that it would cause you to either discard a backup early, or retain a backup that should have been discarded. That's (at least) five hundred and seventy trillion failure modes that your regex would not and could not detect!

    Let me explain. For any decimal digit there are 15 single bit failures that could morph one digit into another valid decimal digit. Eg. 0x31 ascii('1') with corruption of bit-2 on, becomes 0x33 ascii('3'). Or with bit-3 corrupted on: 0x35 ascii('5'); and bit-4 ascii('9'). Do that for all digits and all bits and then combine them out and you get 576 trillion possible combinations that the regex will not detect but that could cause extensive problems through your discarding your latest snapshots or retaining old ones when you shouldn't. And that's just the single bit per digit combinations.

    And the possibility of single bit corruptions occurring is far more likely than the disappearance of all 10 digits that it would require before your regex would fail.

    You cannot hope to detect, much less deal with, these kinds of failures through defensive programming at the statement level in all your scripts. It can only be done through a combination of error detecting hardware and file-systems. Programming to detect just some of the possible errors, especially when that subset are the least likely to occur, is pointless and naive.

    Money down the drain for the sake of a false sense of security. Extra complexity and extra runtime (more costs) for no possible benefit.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      I want to make sure I understand what you're saying here:
      ... For any decimal digit there are 15 single bit failures that could morph one digit into another valid decimal digit... And the possibility of single bit corruptions occurring is far more likely than the disappearance of all 10 digits that it would require before your regex would fail.

      I don't think it's just a matter of "the disappearance of all 10 digits" -- it's any disruption of the expected 10-digit string (in combination with a preceding name string), such as might happen if, say, one or more processes are writing both stdout and stderr to a single file handle, asynchronously, so that warning messages might interrupt in the middle of normal report lines, rather than being neatly interleaved. (I've seen that happen -- it's ugly.)

      Comparing the variety of corruptions of that sort to the limited set of bit inversions you speak of, that turn, e.g., a "5" (0x35) into a "7" or "4" or "1" (but excluding what should be equally likely corruptions that turn "5" into 'NAK' (0x15) or '%' (0x25) or "u" (0x75) or any of the non-ASCII outcomes with high bit set), how do you figure that those particular bit inversions are "far more likely"?

      Having just read your other reply to me below, I gather that my "twisted stderr/stdout" example is not part of the OP's scenario -- unless the poster is doing something different from what you assumed... Anyway, thanks for that -- your assessment is helpful, as usual.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://910355]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others cooling their heels in the Monastery: (4)
As of 2024-03-29 06:22 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found