matthewsnape has asked for the wisdom of the Perl Monks concerning the following question:

I have a string:

7374,726327,"76,237",32324,"21,342,857",23

I would like to delete commas if they are within a block enclosed by quotation marks to make:

7374,726327,"76237",32324,"21342857",23

I have tried a transform but with no sucess:

$string =~ tr/".*,.*"/".*"/

How can I do this?
  • Comment on deleting commas enclosed by quotation marks

Replies are listed 'Best First'.
Re: deleting commas enclosed by quotation marks
by Fletch (Bishop) on Jun 21, 2005 at 23:32 UTC

    You're confusing tr/// for s///; see perldoc perlop. But more to the point you probably want something like Text::CSV_XS to parse your CSV into an array, remove embedded commas as needed, and then use the same module to write it back out.

    --
    We're looking for people in ATL

Re: deleting commas enclosed by quotation marks
by ikegami (Patriarch) on Jun 21, 2005 at 23:50 UTC

    Tokenizer approach:

    $input = '7374,726327,"76,237",32324,"21,342,857",23'; print("input: $input\n"); $output = ''; foreach ($input) { if (/\G("[^"]+")/gc) { my $quoted = $1; $quoted =~ s/,//g; $output .= $quoted; redo; } # Unmatched quote. if (/\G"/gc) { $output .= '"'; redo; } if (/\G([^"]+)/gc) { $output .= $1; redo; } } print("output: $output\n");

    To avoid reinventing the wheel, I do recommed Text::CSV_XS or similar over my solution. Use it to seperated the fields, then remove the comma from every field with something like s/,//g foreach @fields;.

      foreach sounds funny here.

        It's using the fact that it aliases $_ so that you don't have to explicitly bind to $input each time. It's iterating over a list, it just happens to only have one element.

        Of course some wonks might say that in addition to the spelled-for-pronounced-foreach and vice versa you've got this spelled-foreach-pronounced-with . . . :)

        --
        We're looking for people in ATL

Re: deleting commas enclosed by quotation marks (ss)
by tye (Sage) on Jun 22, 2005 at 00:18 UTC
    s/("[^"]*")/ my $s= $1; $s =~ s-,--g; $s /ge;

    - tye        

Re: deleting commas enclosed by quotation marks
by GrandFather (Saint) on Jun 21, 2005 at 23:39 UTC
    use strict; use warnings; my $Orig = '7374,726327,"76,237",32324,"21,342,857",23'; my $Mod = $Orig; 1 while $Mod =~ s/((?:^|,)"[^"]*?),([^"]*?"(?:,|$))/$1$2/g; print "$Orig => $Mod\n"; 7374,726327,"76,237",32324,"21,342,857",23 => 7374,726327,"76237",3232 +4,"21342857",23
    Update: Add output generated.

    Perl is Huffman encoded by design.
Re: deleting commas enclosed by quotation marks
by injunjoel (Priest) on Jun 22, 2005 at 00:23 UTC
    Greetings all,
    Yet another way to go about it:
    $string = '7374,726327,"76,237",32324,"21,342,857",23'; $string =~ s/("[^"]+")/my $tmp = $1; $tmp =~ s!,!!g; $tmp/eg; print $string;
    outputs
    7374,726327,"76237",32324,"21342857",23

    -InjunJoel
    "I do not feel obliged to believe that the same God who endowed us with sense, reason and intellect has intended us to forego their use." -Galileo
Re: deleting commas enclosed by quotation marks
by jZed (Prior) on Jun 22, 2005 at 03:01 UTC
    Are you sure that removing the commas is really what you want to do? If you are removing them just to make it easier to parse the data into fields, modules such as Text::CSV_XS or Text::xSV will parse the kind of data you show just fine without having to remove the commas. If you really want to remove the commas for some other reason, then those modules are also probably the easiest way to accomplish that also.
Re: deleting commas enclosed by quotation marks
by Anonymous Monk on Jun 22, 2005 at 09:16 UTC
    I'd say that all commas, except the first two and the last, in the given example, are enclosed by quotation marks. In which case, I'd use:
    substr($str,index($str,'"'),rindex($str,'"')-length($str))=~y/,//d;