suffers from the following issues
Oh yeah, I'm sure there are plenty of problems. I wasn't really attempting to build a general-purpose CSV parser, just demonstrating that it's not too terribly difficult to handle this kind of thing (for some values of "handle") with a regex.
A major part of a parser's job is to detect data that doesn't conform to the specification.
Indeed. That's where a single regex solution generally falls down flat. Perhaps one could make a pre-scanner that looks for problems ahead of time, but for handling arbitrary, user-supplied data, a real parser should be built (or grabbed from CPAN, as it were).
Just as a side note, I use this same technique to parse Apache logs. It's simply a matter of my @logentry = /("[^"]+"|\[[^\]]+\]|\S+)/g; and the log entry is split up nicely. Notice it handles both quote-delimited and square-bracket-delimited chunks. It looks messy, but it's dead simple. Perhaps one could even use variables to make it more readable:
my $quoted = qr/" [^"]+ "/x; my $bracketed = qr/\[ [^\]]+ \]/x; my $bare = qr/ \S+ /x; while (<LOGFILE>) { my @logentry = /($quoted|$bracketed|$bare)/g; }
Hopefully I haven't strayed too far off the point. Not that anyone will probably read this deeply into the thread anyway, but oh well. 8^)
In reply to Re5: Comma separated list into a hash
by revdiablo
in thread Comma separated list into a hash
by Anonymous Monk
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |