loris has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks,

I am using DBD::CSV to manipulate a csv-file. However, I now find that I need to add a column to the file. As DBD::CSV or rather SQL::Statement::Syntax does not seem to support ALTER TABLE, I'll have to change the csv-file directly.

I could just split the lines on the delimeter and add an extra delimeter at the appropiate place, but I thought it would be more snazy to use a regular expression. However, I haven't been able to find out how to specify the nth occurence of something.

Can anyone point me in the right direction?

Thanks,

loris

Replies are listed 'Best First'.
Re: Replacing nth occurrence of string
by graff (Chancellor) on Sep 23, 2005 at 14:34 UTC
    I trust you understand the potential problem with svenXY's proposed solution: it'll screw up badly if your CSV data contains any quoted occurrences of the delimiter character (e.g. any record like  field1,"field2,quoted",field3

    I don't think a regex is the right way to go here. If you cannot "ALTER" a table, why not "CREATE" a new one with the extra column wherever you want it, and then just "DROP" the old one?

      Hi graff++,

      you are completely right with your comment on my solution failing on quoted delimiters. I should have mentioned that but was tempted to provide the easy-quick-and-not-perfect-solution ;-)

      There would be ways to prevent this from happening, but as this is supposed to be a DB-like thing, your suggestion is much better (although it forces the OP to completely parse the file, recreate it and write out the data).

      Regards,
      svenXY

      Thanks for pointing out the quoted-delimiter problem. The idea to simply CREATE a new table, INSERT the old data, and then rename the new table does seem a good way to go and, as svenXY say, nicely DB-like.

      Thanks,

      loris

Re: Replacing nth occurrence of string
by svenXY (Deacon) on Sep 23, 2005 at 13:02 UTC
    Hi,
    #!/usr/bin/perl -w use strict; my $delimiter = ';'; my $nth = 3; my $regex = '^(' . ".*$delimiter"x$nth . ')'; while (<DATA>) { s/$regex/$1$delimiter/ && print; } __DATA__ col1;col2;col3;col4 aaa;bbb;ccc;ddd AAA;BBB;CCC;DDD

    Output is then:
    col1;col2;col3;;col4 aaa;bbb;ccc;;ddd AAA;BBB;CCC;;DDD
    Regards,
    svenXY
      When assembling regular expressions, beware: if $delimiter had any regex special characters, you would have to use \Q$delimiter\E in the assembled regex.
      my $delimiter = '.'; my $nth = 3; my $regex = '^(' . ('.*\Q' . $delimiter . '\E') x $nth . ')';

      --
      [ e d @ h a l l e y . c c ]