Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Regexp nightmare with CSV

by Pingu (Sexton)
on May 28, 2001 at 20:09 UTC ( [id://83744]=perlquestion: print w/replies, xml ) Need Help??

Pingu has asked for the wisdom of the Perl Monks concerning the following question:

I'm writing a small script for my running club's site to allow searchable race results.

The data is stored in a CSV format file which I split(/,/) on commas into an array. This is all great except some of the data fields contain commas which I want to keep:

The file looks something like this:

1,"FirstName","Surname","Running club, Country",hh:mm:ss 2,"etc.","etc.","Different club, same country",hh:mm:ss

The commas in the 4th field are to be kept, not split on.

I've tried:

while (<FP>) { s/"(.+?),(.+?)"/g; (@row) = split(/,/); }

but it doesn't work - it picks up the wrong commas. Can anyone help please?

I have a feeling that I need a non-backtracking pattern but I can't suss it.

Thanks folks,

Pingu

Edited 2001-05-28 by Ovid

Replies are listed 'Best First'.
Re: Regexp nightmare
by petdance (Parson) on May 28, 2001 at 20:13 UTC
    You want Text::CSV_XS. It works wonderfully, and is flexible as you can want it. Embedded carriage returns? Comma separators are actually pipes? No problem.

    (And, as an aside, this question just reinforces my thoughts about needing a corrolary to TMTOWTDI.)

    xoxo,
    Andy

    %_=split/;/,".;;n;u;e;ot;t;her;c; ".   #   Andy Lester
    'Perl ;@; a;a;j;m;er;y;t;p;n;d;s;o;'.  #   http://petdance.com
    "hack";print map delete$_{$_},split//,q<   andy@petdance.com   >
    
Re: Regexp nightmare
by Coyote (Deacon) on May 28, 2001 at 20:14 UTC
    I would recommend checking out one of the CSV modules on CPAN rather than rolling your own. Possible candidates:

    ----
    Coyote

      Text::CSV cannot handle embedded returns, nor is its API consistent with handling them. For a pure Perl solution that does handle embedded returns correctly you can try Text::xSV.
        Do you mean CR or CRLF in the fields?

        The way I always get around it with Text::CSV_XS is to treat it like an MS-DOS/Win32 text file.
        # Code that writes CSV out. $csvstring=~s/\cM\cJ/\cM/g; print SH $string."\cM\cJ"; # Code that reads Parses CSV { local $/ = "\cM\cJ"; # end of line is now \cM\cJ while (<INFILE>){ if ($csv->parse($line) ){ my @columns=$csv->fields; # Process data here }else{ die "Error Parsing: $csv->error_input\n"; } } }


        -Lee

        "To be civilized is to deny one's nature."
        Lovely - does exactly what it says on the tin. I particularly like bind_header() and the ability to extract only those fields you require. Thankyou for that you have solved my prob. Pingu (logged in at work and can't remember my p/word ---
Re: Regexp nightmare
by JP Sama (Hermit) on May 28, 2001 at 20:24 UTC
    I think you could just abandon the CSV file.. and use TAB (\t) as your delimiter...
    please check THIS NODE, by BBQ for more information!

    #!/jpsama/bin/perl -w
    $tks = `mount`;
    $jpsama = $! if $!;
    print $jpsama;
    
Re: Regexp nightmare with CSV
by larryk (Friar) on May 29, 2001 at 00:29 UTC
    If your data is the same (_always_) then you can use a specific regex to get the data out. Or, perhaps more appropriately, to modify your delimiters:

    Case 1 - permanent regex:

    for my $line (@lines_from_data_file) { my($idx,$fname,$sname,$loc,$time) = $line =~ /^(\d+),("[^"]+"),("[^"]+"),("[^"]+"),(.*)$/; }

    Case 2 - one-liner to modify delimiters (in place)

    perl -i.bak -ne "s/([\d\x22]),/$1.'|'/eg;print" datafile # for some reason I can't use single quotes for a # perl -e on Win32 so I have to use \x22 for "

    Case 3 - just realised you can use the regex above (slight mod.) for your split.

    @data = split /([\d"]+),/;
    I still suggest that case 2 is your best option - you're just making more work for yourself if you don't.

    "Argument is futile - you will be ignorralated!"

Re: Regexp nightmare
by Pingu (Sexton) on May 28, 2001 at 20:14 UTC
    Arrgh, formatting hassle. Apologies for the, err missing bits... maybe if I type:
    while (<FP>) { s/"(.+?),(.+?)"/\1===\2/g; (@row) = split(/,/); foreach (@row) { s/===/,/g; } }
    it might be clearer.... or perhaps not.
Re: Regexp nightmare
by Pingu (Sexton) on May 28, 2001 at 20:17 UTC
    Holy regexp, 2 replies before I'd even got the question right! Thanks a million! Pingu

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://83744]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (4)
As of 2024-04-25 12:19 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found