I've been trapped by this regexp behaviour before, and I got trapped again.
The problem this time:
I have a file with tab separated data.
The data is intended for loading into a mysql databas.
Because of that I shall replace every empty field with a null mark (in this case \N).
So every tab that is immediatly followed by another tab or newline (or end of file) should have this "\N" attached to it.
Walking right into the trap
$line =~ s/\t([\t\n])/\t\\N$1/g;
And of course, when two empty fields are following one eachother my regexp fails doing what I want.
This little snippet (where I am using pipes as separators, for the sake of visibility) illustrates my point.
while (my $line = <DATA>) {
$line =~ s/\|(\||$)/\|\\N$1/g;
print $line;
}
__DATA__
a|||d|
The output will be:
a|\N||d|\N
How do you guys usually deal with this?
If someone can come up with an one-liner it would really suit me best.
Thanks in advance
/L
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.