in reply to regular expression (search and destroy)
Comma-seperated input is a deceptive and complex format. It sounds simple but involves a fairly complex escaping systembecause the fields themselves can contain commas. This makes pattern matching a solution complex and rules out the simple split /,/ .
This sums up the situation pretty well. There is no definition of a CSV format and something that works with one data format may not work with another. The book goes on to provide the following example code which I have modified to trim whitespace. This should work with Excel generated CSV files.
#! /usr/bin/perl # use strict; while (<DATA>) { chomp; # avoid matching against newline my $words = join "=>", parse_csv($_); print "$words\n"; } sub parse_csv { my $text = shift; # record containing comma-separated values my @new = ( ); push @new, $+ while $text =~ m{ \s*"([^\"\\]*(?:\\.[^\"\\]*)*)"\s*,? # Matches a phrase tha +t may contain commas | \s*([^,]+)\s*,? # Something that is not a comma | \s*, # Just a comma - no data }gx; push (@new, undef) if substr($text, -1,1)eq ','; return @new; } __DATA__ 1,the,simple,case "with","quoted" , "strings that contain spaces" "with","quoted" , "comma, internally" "with","quoted" , "comma, internally, with null data",,,
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Re: regular expression (search and destroy)
by demerphq (Chancellor) on Nov 13, 2003 at 15:21 UTC |