joec_ has asked for the wisdom of the Perl Monks concerning the following question:

Hi

I have a CSV in which i need to replace the delimiters. I need to change from comma to pipe (|). I know it shouldnt be challenging but i cant seem to twig it. The header fields should be changed to pipe seperated as well.

I have tried:

while ($line = <INPUT>){ $line =~ s/,/|/g unless ($line =~ /,\"(.*),(.*)\",/); }
So essentially substitute all commas for pipes except quoted strings. I know its something like this, but cant quite get it ...

The upshot of this is that each line (as a pipe separated string) is going to get inserted into Oracle as a CLOB.

Any help appreciated

Thanks -- Joe

-----

Eschew obfuscation, espouse elucidation!

Replies are listed 'Best First'.
Re: Replacing commas with pipes
by almut (Canon) on May 11, 2009 at 19:11 UTC

    As this doesn't seem to be for purely educational purposes, I'd recommend using Text::CSV, instead of reinventing the wheel...

Re: Replacing commas with pipes
by toolic (Bishop) on May 11, 2009 at 19:21 UTC
    Here is an example using Text::CSV_XS
    use strict; use warnings; use Text::CSV_XS; my $csv = Text::CSV_XS->new(); # create a new object while (<DATA>) { my $status = $csv->parse($_); # parse a CSV string into fields my @columns = $csv->fields(); # get the parsed fields my $str = join '|', @columns; print "$str\n"; } __DATA__ a,b,c d,e,"f|g"

    prints:

    a|b|c d|e|f|g

    Is this the output you are looking for?

      Your second example turns a 3 column record into a 4 column record. Now, it matches the specification as given, but if you go the CSV way, it doesn't seem to be a logical solution.

      Considering that Text::CSV_XS allows you to set the separator, it seems more logical to use two Text::CVS_XS objects: one to read, one to write.

        re JavaFan's comment about toolic's "...second example":

        /me thinks that's a typo in data (d,e,"f|g" should be d,e,"f,g"), as OP's spec has commas throughout...even inside the quotes.

        Update: rephrased to clarify first line with attributions.

        Update2: In fairness, JavaFan's s/("[^"]*")|,/$1||"|"/eg; appears to work with this data:

        "x,y,z",red,blue 1,"2,3",4,foo,bar,"blivitz_kung" a,b,"c,d,e,",nodereaper "Super_Search","XP per node",data,"code tags,markup" monk,troll,saint,"holders of unholy powers","Orders of Monks:novice,in +itiate,etc"

        (Update3: bah, humbug, copied wrong data. Fixed above.)

        producing this output:

        "x,y,z"|red|blue 1|"2,3"|4|foo|bar|"blivitz_kung" a|b|"c,d,e,"|nodereaper "Super_Search"|"XP per node"|data|"code tags,markup" monk|troll|saint|"holders of unholy powers"|"Orders of Monks:novice,in +itiate,etc"

        Nonetheless, given the likelihood of (unknown, unspecified) edge-cases, a well-tested module remains the way to go, IMO.

Re: Replacing commas with pipes
by JavaFan (Canon) on May 11, 2009 at 19:16 UTC
    Untested:
    s/("[^"]*")|,/$1||"|"/eg;
    This ought to replace any comma outside double quoted strings with a pipe.

    That doesn't mean it will munge your data into something Oracle accepts.

    Are you sure Oracle cannot be told what to use as delimiter?