in reply to Re^2: Can't use an undefined value as a HASH reference
in thread Can't use an undefined value as a HASH reference

In that case, you could make either of two changes:

  1. Modify the sanity test to die if the input data is malformed.
  2. Modify the failing code to test for undef before attempting to dereference the hashref.

But there is the school of thought termed the robustness principle, that goes Be conservative in what you send, be liberal in what you accept.

I would do both: make the sanity test issue a warning (with file line number) and then return undef; and add a test for undef before dereferencing.


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
In the absence of evidence, opinion is indistinguishable from prejudice.
I'm with torvalds on this Agile (and TDD) debunked I told'em LLVM was the way to go. But did they listen!
  • Comment on Re^3: Can't use an undefined value as a HASH reference

Replies are listed 'Best First'.
Re^4: Can't use an undefined value as a HASH reference
by Monk::Thomas (Friar) on Aug 26, 2015 at 10:50 UTC

    But there is the school of thought termed the robustness principle, that goes Be conservative in what you send, be liberal in what you accept.

    I definitely prefer 'Be conservative in what you send, be conservative in what you accept.' I hold the opinion that trying to be liberal with input (especially: trying to fix malformed input) is pretty much guaranteed to lead to a bug somewhere later on.

    @OP: Both of the suggestions BrowserUK gave are good. But trying to apply the robustness principle (-> be liberal in what you accept) and automagically initialize a hash reference may sound like a good idea at first but can turn out to be a bad one.

      Both of the suggestions BrowserUK gave are good. But trying to apply the robustness principle (-> be liberal in what you accept) and automagically initialize a hash reference may sound like a good idea at first but can turn out to be a bad one.

      Firstly, no where in my post did I suggest: "automagically initialize a hash reference"; so that's a pure strawman.

      I definitely prefer 'Be conservative in what you send, be conservative in what you accept.'

      And on that I absolutely and fundamentally disagree. And I'll go further and say: that attitude (not your's personally; but in general in the industry) has set the industry back by two decades.

      Specifically; the phrase I highlight in your next sentence:

      I hold the opinion that trying to be liberal with input (especially: trying to fix malformed input) is pretty much guaranteed to lead to a bug somewhere later on.

      That term "malformed input" had almost never appeared in the literature prior to the appearance of the XML standardisation process.

      At last after nearly 20 years, the industry is finally beginning to realise the damage that ill-conceived and misbegotten monstrosity has done and it is now being widely reviled and rejected in favour of simpler, cleaner, more flexible data formats.

      Imagine you've downloaded (or retrieved from a corrupted disk or tape) several gig of (XML) data but its suffered some corruption. Your (the XML) approach would simply reject the whole lot as malformed. End of.

      But maybe that corruption may only affect part of the data that you don't need; or a few hundred records out of millions. A robust and flexible approach will allow your program to get and process most of the data; and report that data which it can't process. The results will often be enough for the purpose; but if they aren't, you at least have a clear record of what is missing and where you need to expend special effort.

      Every now and again, even the best websites construct pages that aren't well-formed. If browsers followed your/the XML approach and simply threw their hands up -- malformed input -- we'd get to see nothing; which would be silly because most of the time, even if the page is slightly or even heavily corrupt in terms of its presentation; the information it contains is still readable; usable.

      The Robustness Principal is well-named; and is not one of those purely theoretical, high-brow principles, but rather the product of practical experience and pragmatism. And it works! TCP/IP would not work without it.

      You reject the pragmatism of your predecessors at your own peril.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
      In the absence of evidence, opinion is indistinguishable from prejudice.
      I'm with torvalds on this Agile (and TDD) debunked I told'em LLVM was the way to go. But did they listen!

        Firstly, no where in my post did I suggest: "automagically initialize a hash reference"; so that's a pure strawman.

        Sorry, I did not want to imply you were suggesting this. It is simply an example in how one could apply the robustness principle to this particular case.

        Regarding the rest of your posting:

        First I'd like to define my understanding of 'liberal to accept': It would mean to try as hard as possible to make some sense out of the provided data. In order to do so you may need to apply some guesswork / heuristics or ignore contradictions. (I think of it as a blacklist approach - some kind of data is definitely wrong and will be refused, everything else is going to yield some kind of result.) On the other hand being 'conservative to accept' would require to establish specific rules what is acceptable and then refuse everything else. (a whitelist approach) These rules can be quite extensive - conservative does not imply a narrow focus - it just means the rules are enforced.

        I can argue exactly to the contrary by saying this whole 'be liberal in what you accept' is throwing us back 20 years. 2 Examples: Never accepting invalid input would have prevented Stagefright (if a movie file contains strange data then simply reject to play it, don't attempt to automatically fix the file and generate a heap overflow) and an Android-packaging trojan horse (forgot the name, basically you pack two different archives into the same package, Google's package check would only look at the first one and deem it safe - the second contains the malware and is the one that's actually installed on the phone) because the tools would not need to bother about 'properly' handling invalid data and trying to guess what the intention was. (As can seen from the second example: The problem can be as simple as guessing differently.)

        I don't really care if you want to name it 'malformed', 'invalid', 'corrupt' or whatever. It's not an XML-problem. It's a networking/environmental problem.

        'be liberal in what you accept' is quite nice if you're working in a rather isolated place where you can control where stuff is coming from and what kind of stuff that may be - or where everyone is nice and friendly and just tries to get a long.

        But today's internet isn't such a nice and friendly place. It's openly hostile. As soon as you connect something interesting to the net then there's a pretty good chance somebody is interested in it enough to take a very close look and try to figure out how to exploit it. And that's why I'm also concluding it's throwing us back 20 years - it's throwing us back into a time when we were less connected and less easy to exploit. (As long as I don't insert any media I am perfectly safe. USB was some newfangled toy which probably does not last long anyway.) Networks were mostly isolated. Yes, BBS and Internet did exist back then. But for most people it was not relevant at all. That's the reason why I picked the 2 Android exploits - they are relevant to roughly a billion devices (~700 million people?).

        'malformed', 'invalid', 'corrupt' is not something that accidentally happens any more. Someone will manipulate the data to make sure it triggers exactly the right spots and delivers its payload. Someone wanted to overload AntiVirus-Scanners and carefully crafted 42.zip - 42 kBytes which extract into multiple Petabytes. That was more of a roundhouse kick but even if it's a very specific exploits that triggers in very specific conditions - with today's billions of connected devices you are pretty much guaranteed to find exploitable targets.

        I don't know what a suitable approach for your XML-example is. In a different situation it may be a problem to discard the malformed data. If you have someone who is responsible for the data quality then it could be a good time to fetch the cluebat and apply some data-quality improvement lessons. If you have no one who is accountable then the 'ignore crap' approach seems reasonable. Maybe your example is actually a good example why 'be liberal' is actually a bad thing. Maybe the one who's generating your XML is relying on you to be liberal and therefore him/her being able to get away with producing garbage.

        P.S.: Automated cars are going to be very liberal in what they need to accept as suitable road conditions and it scares the hell out of me. Especially since car maker don't even understand the concept of proper separation between different security zones yet.