Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I've been trapped by this regexp behaviour before, and I got trapped again.

The problem this time:
I have a file with tab separated data.
The data is intended for loading into a mysql databas.
Because of that I shall replace every empty field with a null mark (in this case \N).
So every tab that is immediatly followed by another tab or newline (or end of file) should have this "\N" attached to it.

Walking right into the trap

$line =~ s/\t([\t\n])/\t\\N$1/g;
And of course, when two empty fields are following one eachother my regexp fails doing what I want.
This little snippet (where I am using pipes as separators, for the sake of visibility) illustrates my point.
while (my $line = <DATA>) { $line =~ s/\|(\||$)/\|\\N$1/g; print $line; } __DATA__ a|||d|
The output will be:
a|\N||d|\N

How do you guys usually deal with this? If someone can come up with an one-liner it would really suit me best.
Thanks in advance
/L

Replies are listed 'Best First'.
Re: Regexp: Overlapping matches (?=)
by tye (Sage) on Feb 22, 2008 at 15:32 UTC
    s/\t(?=[\t\n])/\t\\N/g;

    But this type of simple regex doesn't handle quoted values that contain adjacent tabs, of course.

    To also handle the first field being empty:

    s/(^|\t)(?=[\t\n])/$1\\N/g;

    - tye        

Re: Regexp: Overlapping matches
by hipowls (Curate) on Feb 23, 2008 at 00:24 UTC

    You could use Text::CSV_XS or Text::CSV to parse the file. It has an option to set the separator character.

    use Text::CSV_XS; my $csv = Text::CSV_XS->new ({ sep_char = "\t" }); open my $io, "<", $file or die "$file: $!"; while (my $row = $csv->getline ($io)) { my @fields = @$row; ... }