You asked about a measure. That measure is how you would determine between the first and the second cases. Otherwise, you have your gut to go on. Guts are very poor measures of anything.
My criteria for good software:
Does it work?
Can someone else come in, make a change, and be reasonably certain no bugs were introduced?