Comma-seperated input is a deceptive and complex format. It sounds simple but involves a fairly complex escaping systembecause the fields themselves can contain commas. This makes pattern matching a solution complex and rules out the simple split /,/ .
This sums up the situation pretty well. There is no definition of a CSV format and something that works with one data format may not work with another. The book goes on to provide the following example code which I have modified to trim whitespace. This should work with Excel generated CSV files.
#! /usr/bin/perl # use strict; while (<DATA>) { chomp; # avoid matching against newline my $words = join "=>", parse_csv($_); print "$words\n"; } sub parse_csv { my $text = shift; # record containing comma-separated values my @new = ( ); push @new, $+ while $text =~ m{ \s*"([^\"\\]*(?:\\.[^\"\\]*)*)"\s*,? # Matches a phrase tha +t may contain commas | \s*([^,]+)\s*,? # Something that is not a comma | \s*, # Just a comma - no data }gx; push (@new, undef) if substr($text, -1,1)eq ','; return @new; } __DATA__ 1,the,simple,case "with","quoted" , "strings that contain spaces" "with","quoted" , "comma, internally" "with","quoted" , "comma, internally, with null data",,,
In reply to Re: regular expression (search and destroy)
by inman
in thread regular expression (search and destroy)
by data67
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |