The above code will print "Good\n". If you've worked with CSV data for any length of time you know that the odds are good that sooner or later you'll get a line with all commas (at least I have). Depending upon what is done with the data, you could have serious data corruption.#!/usr/bin/perl $_ = ",,,,"; print "Good\n" if /(?:[^,]*,\s*){3}(.*?)\s*,/;
Also, I would try to avoid the (.*?)\s*, construct. It's not terribly specific and can cause problems. ([^,]+)\s*, is very specific and is more appropriate. In fact, if you know that the data you are capturing won't have any embedded spaces or tabs (and I'm assuming that everything is on one line), you can use ([^, \t]+),.
In this case, I don't feel that it will cause a problem with how your regex is crafted, but subtle errors can creep in down the road as maintenance occurs. Your regex is fine because the whitespace behind it is optional, but the negated character class is almost always preferable because it states exactly what you want.
Consider the following problem: you want to print the first field of comma-delimited text if the last character prior to the comma is a sharp (#), but you don't want to capture the sharp. If the data doesn't fit this format, you want the regex to fail completely. The following regex looks fine at first glance:
It is, however, a bad choice. The negated character class is proper:print "$1\n" if /^(.+?)#,/;
The first regex above will print test1, test2. I'm not trying to sound picky, but any time I see the .* or .+ used in a regex, I always look for a way to remove it because it's not terribly precise.#!/usr/bin/perl $_ = "test1, test2#,test3"; print "$1\n" if /^(.+?)#,/; # Returns a false positive print "$1\n" if /^([^,]+)#,/; # This fails, as we expect
In reply to (Ovid) RE: Re: Regexp glitch while parsing record in CSV
by Ovid
in thread Regexp glitch while parsing record in CSV
by greenhorn
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |