http://qs1969.pair.com?node_id=473203

monger has asked for the wisdom of the Perl Monks concerning the following question:

I've written a quick and dirty script to change a comma seperated DB dump to a tab delimited file for importing. Here's the code:
my $log_file = "/c/mysql.out"; my $out_file = "/c/mysql.dshield"; open LOG, "$log_file" || die "Can't open log file: $!"; open OUT, ">$out_file" || die "Can't open output file: $!"; while (<LOG>) { s/,/\t/g; print OUT $_; } close OUT || die "Can't close the output file: $!"; close LOG || die "Can't close the log file: $!";
Here's a snippet of what it's parsing:

2005-07-06 00:00:00-05:00,85099794,1,202.97.174.226,1038,192.168.1.20, 1434,udp,

Here's what is dumped out after the script chews it up:

2005-07-06 00:00:00 -05:00 85099794 1 202.97.174.226 1038192.168.1.20 1434 udp

So, why would this miss replacing the third from last comma with a tab? It simply deletes the comma without replacement. I can't figure this out! Help please?? Monger

Monger +++++++++++++++++++++++++ Munging Perl on the side

Replies are listed 'Best First'.
Re: Code Misses a Replacement
by dbwiz (Curate) on Jul 07, 2005 at 19:45 UTC

    I agree with ww. The tab is there.

    Try this:

    s/,/<\t>/g;

    Then, you will see the tab even if your settings prevent it.

    On a side note, I would do the whole affair with a one-liner:

    perl -pe 's/,/\t/g' < /c/mysql.out > /c/mysql.dshield
      dbwiz, Thanks for the bracket tip. That got it. And I'll likely use the one liner for an eventual multi-lang batch job.

      monger

      Monger +++++++++++++++++++++++++ Munging Perl on the side
Re: Code Misses a Replacement
by ww (Archbishop) on Jul 07, 2005 at 19:36 UTC
    Can't tell for sure without verbatim of output, but since there appears to be a \s in the output in the spot where the third from last comma was in the original, suspect the issue is appearance, ONLY. Look at the output with an editor (hex, whatever) that lets you see the actual bytes...

    A tab can appear to be a single space, depending on its location, tabwidth, etc.

Re: Code Misses a Replacement
by Xaositect (Friar) on Jul 07, 2005 at 19:48 UTC

    This may makes things more complicated than you need, but I should point out that most CSV dumps use quotation marks to escape strings that have commas in them. This is something to watch for: some,comma-delniated,"file with a, comma",in the data

    You might take a look at Text::CSV, you could do something like: (untested)

    use Text::CSV; my $csv = Text::CSV->new(); while (<>) { $csv->parse($_); print join("\t", $csv->fields()); }
    That's pretty simplistic, and won't handle tabs in the data, but you get the idea.


    Xaositect - Whitepages.com

      I would agree that the dump may do many interesting things when it goes to CSV, including escaping certain chars. I suggest the following code (which I use a variation of to convert Semi-Colon SV files to CSV files):

      use IO::File; use Text::CSV_XS; for (@ARGV) { my $out_fname = $_.'.dshield'; my $inf = new IO::File ( $_,'<' ) or die "Cannot read $_"; my $outf = new IO::File ( $out_fname ,'>' ) or die "Cannot write $o +ut_fname"; my $csv_in = new Text::CSV_XS; # defaults work for most CSV's my $csv_out = new Text::CSV_XS({sep_char=>"\t"}); # use tabs until ($inf->eof) { my $line = $csv_in->getline($inf); $csv_out->print($outf, $line); } } ## IO::File objects close automatically when they go out of scope

      This gets used as:

      c2t.pl file1.out {file2.out} {...}
      , and writes the results to file1.out.dshield, etc. By using the Text::CSV_XS module, you will be certain of processing CSV and Tab-SV files correctly. Though it's more code, it performs quite well and it will likely save you grief in the future.
      Larry Wall is Yoda: there is no try{}
      The Code that can be seen is not the true Code
Re: Code Misses a Replacement
by Roy Johnson (Monsignor) on Jul 07, 2005 at 20:36 UTC
    tr/,/\t/ is probably a better choice than s/,/\t/g, just because it's more tuned for the job.

    Caution: Contents may have been coded under pressure.
Re: Code Misses a Replacement
by Transient (Hermit) on Jul 07, 2005 at 19:39 UTC
    I can't speak as to why it might be missing the third to last tab (but it probably isn't, it's probably just spacing it as one space).

    But, I would suggest doing a s/\|\|/or/g on your source code (that is, replacing the ||'s with or's) or else you won't know when you have a failure in your open or close functions. (||'s precedence is higher than what you want in these cases)
Re: Code Misses a Replacement
by samizdat (Vicar) on Jul 07, 2005 at 19:39 UTC
    Try replacing the comma with \x2C . It also seems to have inserted an extra \t after the main portion of the timestamp.