in reply to How to use split() in perl to ignore white space and ','

Hello iamnewbie, and welcome to the Monastery!

It looks as though your data is in CSV (comma separated values) format, in which case the best approach is to use a dedicated CSV module. For example:

#! perl use strict; use warnings; use Text::CSV_XS; my $testb = "hello,'world, yo',matt"; my $csv = Text::CSV_XS->new({ keep_meta_info => 1, quote_char => "' +" }); my @records; if ($csv->parse($testb)) { my @fields = $csv->fields; for my $col (0 .. $#fields) { if ($csv->is_quoted($col)) { push @records, $csv->{quote_char} . $fields[$col] . $csv->{quote_char}; } else { push @records, $fields[$col]; } } } else { warn "parse() failed on argument: ", $csv->error_input, "\n"; $csv->error_diag; } print "$_\n\n" for @records;

Output:

16:47 >perl 1156_SoPW.pl hello 'world, yo' matt 16:47 >

(Code adapted from the documentation for Text::CSV_XS.)

Hope that helps,

Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

Replies are listed 'Best First'.
Re^2: How to use split() in perl to ignore white space and ','
by iamnewbie (Novice) on Feb 11, 2015 at 07:30 UTC
    Hi Athanasius, Thanks for a warm welcome, Can't we optimize it using regex memory ?
      Sure, if you can exactly tell what the regex should accomplish -
      1. split commas
      2. except commas between quotes
      3. except it's not between any quotes, but the quotes must be balanced
      4. perhaps the quotes are nested
      5. escaped quotes (\') have to be exempt
      6. the same conditions for double quotes
      7. them mixed and matched
      8. what about escaped commas (\,)?
      9. other special cases I forgot to think of?
      I do use split sometimes, but only if it's guaranteed to be commas only. As soon as a special case becomes visible on the horizon, I use Text::CSV (which in turn uses Text::CSV_XS if possible).
      Can't we optimize it

      Please explain in what way you find the code sub-optimal.

      using regex memory

      I'm sorry, I don't understand. What it this 'regex memory' you speak of?

        I think the term 'using regex memory' which iamnewbie used means 'using capturing parentheses' in the regexp.

      "...Can't we optimize it using regex memory?"

      I'm not sure what you mean but i guess you perhaps mean something like this:

      my $string = "hello,'world, yo',matt"; my @result = $string =~ /(\.+),('.+'),(.+)/; print qq($_\n) for @result;

      See also Is guessing a good strategy for surviving in the IT business? and perlretut.

      But i'm sure that this is not really an optimization. I think, the some solutions given already are probably better.

      Edit: I tried to be more precise in judgement...

      Best regards, Karl

      «The Crux of the Biscuit is the Apostrophe»