in reply to Re^2: Getting the right count
in thread Getting the right count

mbethke:

The only decimal cases which effect the output that I've discovered are
1) a record which consists solely of a number (clearly, a case which is cause for concern) or
2) a number using a thousands separator. (Admitted: I followed the sample data, but, as you can see from the regex, numbers with 4 or more digits satisfy the test so long as any thousands separator is omitted. And if it's present, the regex needs minor modification, followed by a function to remove the offending punctuation.

If the first is a case to worry about, spit out the entire record whenever a number satisfies the test. If the latter, the data is sufficiently suspect that its content should be validated... which is a different kettle of fish. So too is the case in which the numbers are binary or or octal or hex or lakhs or ....

But, again, illustrating all that seemed OT to me; TIMTOWTDI is the point.

Replies are listed 'Best First'.
Re^4: Getting the right count
by mbethke (Hermit) on Jan 02, 2012 at 18:34 UTC

    Sorry for the late reply, I was away a couple of days! Happy new year BTW :)

    Yeah, I tried inserting both a blank line (counted as zero) and one containing "foo800bar" (counted as 800). Sure, that's just as easily fixed by tweaking the regexp as the file slurping but I'm not sure it would be quite as easy for the OP, especially as he'd have to spot these things first. If the original solution both works for files of arbitrary size and does some simple format validation without any extra effort, IMHO that's the way to go.