mbethke:
- The first (++) is, indeed a "potential" disadvantage... but it is easily remedied if the need arises. In any case, slurping the file is not the point; use of a regex to ID files is.
- The second -- IMO -- is pretty much specious. Did you try inserting a "format violation" (or several)?
The only decimal cases which effect the output that I've discovered are
1) a record which consists solely of a number (clearly, a case which is cause for concern) or
2) a number using a thousands separator. (Admitted: I followed the sample data, but, as you can see from the regex, numbers with 4 or more digits satisfy the test so long as any thousands separator is omitted. And if it's present, the regex needs minor modification, followed by a function to remove the offending punctuation.
If the first is a case to worry about, spit out the entire record whenever a number satisfies the test. If the latter, the data is sufficiently suspect that its content should be validated... which is a different kettle of fish. So too is the case in which the numbers are binary or or octal or hex or lakhs or ....
But, again, illustrating all that seemed OT to me; TIMTOWTDI is the point.
| [reply] |
Sorry for the late reply, I was away a couple of days! Happy new year BTW :) Yeah, I tried inserting both a blank line (counted as zero) and one containing "foo800bar" (counted as 800). Sure, that's just as easily fixed by tweaking the regexp as the file slurping but I'm not sure it would be quite as easy for the OP, especially as he'd have to spot these things first. If the original solution both works for files of arbitrary size and does some simple format validation without any extra effort, IMHO that's the way to go.
| [reply] |