Re^2: HoA create from array using map, more efficient?
by graff (Chancellor) on Jun 18, 2011 at 15:53 UTC
|
Computers don't generally do things "by chance"...
... so long as other things don't impede normal operation (disk corruption, interrupted network connection, ...). Naturally, things can happen "by chance" that do cause perturbation, so some amount of "paranoia" about expected data formats getting bollixed is always justified, I think.
But in this case, your proposed solution (%hash = map {/(.)(.)/} @array;) has the nice property that there will only be hash elements created for records that have the expected (key, value) content. In terms of error checking, it might suffice just to know how many input records didn't match:
if ( my $lost = @snapshot_listing - keys %snapshot_roster ) {
warn sprintf( "There were %d bad entries in %d input records\n",
$lost, scalar @snapshot_listing );
}
If it's important to know what was in the entries that failed, the map block would need some elaboration -- and I think a subroutine is needed (but at least it only gets called when a record fails to match):
my @bad_entries;
sub nfg { push @bad_entries, $_; return }
my %snapshot_roster = map {
m[(snap-\S+).*+-(\d+)$] ? ($1,$2) : nfg()
} @snapshot_listing;
if ( @bad_entries ) { warn "oh well... win some, lose some.\n" }
| [reply] [d/l] [select] |
|
|
(disk corruption, interrupted network connection, ...).
Sorry, but that really is paranoia.
Firstly, if your network protocol & handling doesn't detect interrupted network connections long before you start running regex against the corrupted or truncated data that comes from it, then you are either using the wrong protocol, or skipping good practice on your handling.
As for disk corruption, -- which does happen -- if your data is important enough then you'll be using raided disks that will detect and correct errors (and flag the corrupted volume early and loudly).
Attempting to programming every statement to try and detect the possibility of hardware failure is a futile exercise that at best costs dear for no benefit, and at worst can be the cause of project cancellation.
Proof by reductio ad absurdum: If you are going down that route, then you would also have to check for the possibility of memory failure -- I had a 2GB ram module fail only a couple of weeks ago.
So what could you do? How can you be sure that when you read a value back from a variable that you get the same value that you stored? Perhaps you store every value in two different variables and then read them both and compare them. But what do you do if they are different? Is it the original value that was corrupted? Or the backup?
No way to tell, so now you have to store everything thrice and do a 3-way compare each time you use a variables value and go for the consensus. But what if it isn't the memory holding one of your three copies of the variable that gets corrupted, but the ram that holds the result of the comparison?
So now you need to have two separate routines that each do the 3-way compare to ensure that they use different memory locations for the result. But when one of the results is corrupted, you don't know which one is the good one, so now you need three routines doing the 3-way compare and then compare the three results. And you need to do this for every variable and every access to every variable.
Ah! But then the memory that holds the results of the comparisons of the results could be the ram location that has a drop out ...
Or, you could just use EEC ram chips!
There is an appropriate place and mechanism for detecting hardware corruption and failure. And "defensive programming" of every line of code is not that place or mechanism.
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] |
|
|
>>>(disk corruption, interrupted network connection, ...).
>>Sorry, but that really is paranoia.
Not completely, I'm afraid, BrowserUK. At least, not with respect to Amazon: where were you during the EBS crash on the US east coast that lasted for days, destroyed data, terminated people's web platforms, interrupted business, and made all major news headlines?
In fact that April crash is the reason I'm writing up some automation of snap/volumes backup rotation (back to my OP topic), to get safe copies of data into alternate AMZ zones. Not every environment, as you may suggest in the other postings is fully locked down/controllable - dev/ops team has several members who have commit into the script repo and have the right/need to run scripts across that platform to fill certain needs - yes it would be "error" if they made an error but the possibility of them doing something wrong is very real - and so is the possiblity of ec2 (not a fully controlled enviroment unless you believe the marketing) blowing up!!
>>And "defensive programming" of every line of code is not that place or mechanism.
I take your point and you are correct in saying that there have to be rules around use of programs in cases like these, which I will stress fully!
-Harold
| [reply] |
|
|
|
|
|
|
I agree, one can always reach a point at which "sanity checking" exceeds "due diligence" and leads to insanity. But to get back to the case at hand, a relevant question for this thread is whether my addition to your map block (with the extra "nfg()" function call) shows a degree of error-checking zeal that costs more than it's worth.
I believe that depends on what is generating/transporting the input to the script, and how much the application cares about out-of-band behavior in the input. I've certainly seen situations where the extra work is needed, and others where it isn't. I tend to err more often on the side of applying some extra error checking and logging when it isn't needed, and less often the other way, because the latter is more troublesome when it happens.
| [reply] |
|
|
Re^2: HoA create from array using map, more efficient?
by hsinclai (Deacon) on Jun 18, 2011 at 15:11 UTC
|
Thanks BrowserUK, will try the alternate construction method...
>> Is that really a possibility?
Yes, sure, if other team members on the platform create snapshots and volumes during operations, by some other method (not my scripting), and use different/partial tags or none at all..
If you're calling me paranoid, I'll take that as a compliment :):)
-Harold
| [reply] |
|
|
If you're calling me paranoid
I'd place your knowledge of the possibility that "Yes, sure, if other team members ... create snapshots by some other method" under the same category as "actual experience, or documentation", so no, you're not paranoid.
Though I'd have to suggest that it might be better to ensure, (mandate; by providing a library to generate the tags), that they do not use some other method, than to look for work-arounds for the possibility that they do. If they do not include a timestamp, how are you going to "decide which objects are old enough to delete"?
Using defensive programming to handle the possibility in production -- and paying the inevitable costs for doing so, which on a platform where you are paying directly for cpu usage directly affects your bottom line -- is a poor substitute for weeding out such errors(*), during pre-production testing.
(*if you mandate the tag format, non-compliance becomes an error.)
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] |