ColtsFoot has asked for the wisdom of the Perl Monks concerning the following question:

I have a bar delimited file of some 1500 lines. As I am processing
each line in the file I need to replace empty entries eg ||
with |\N| ( the null value in postgres) I use the following regexp
$line =~ s:\|\|:\|\\N\|:g
This works for the first 40-50 lines and then seem to give up
leaving occurrences of the || string in the resulting file
My test case code is the following
#!/usr/bin/perl -w use strict; open INPUT, 'try.asc'; while(my $line=<INPUT>) { print qq(Original line = $line); $line =~ s/\|\|/\|\\N\|/g; print qq(New line = $line); }
Has any one any ideas?

Replies are listed 'Best First'.
Re: Broken regexp
by Abigail-II (Bishop) on Mar 09, 2004 at 13:22 UTC
    I guess you are using | as a separator. The problem with your approach is that you want to replace the content, but you are matching the separators as well - but separators are "shared". One way of solving this is the use of split and join:
    join "|" => map {length () ? $_ : '\N'} split /[|]/ => $line, -1;
    But that assumes there are no leading or trailing vertical bars. Or you can use lookahead:
    $line =~ s'[|](?=[|])'|\N'g;

    Abigail

Re: Broken regexp
by Juerd (Abbot) on Mar 09, 2004 at 13:02 UTC

    Do you happen to want ||| to become |\N|\N|? If so, put the second \| in a look ahead assertion and remove it from the replacement string. (see perlre)

    Juerd # { site => 'juerd.nl', plp_site => 'plp.juerd.nl', do_not_use => 'spamtrap' }

Re: Broken regexp
by jbware (Chaplain) on Mar 09, 2004 at 13:24 UTC
    To borrow a concept from Juerd's comment, you could try the following, using zero-width look-ahead & look-behind assertions.
    $line =~ s/(?<=\|)(?=\|)/\\N/g;
    -jbWare
Re: Broken regexp
by matija (Priest) on Mar 09, 2004 at 13:17 UTC
    How about this:
    $line=join('|',map(defined($_) ? $_ : '\N', split('|',$line)));
    (Split on the separator, replace undefined fields with \N, glue it back together again.

      $line=join('|',map(defined($_) ? $_ : '\N', split('|',$line)));'

      Empty fields will be defined, but empty. In other words: they will be strings with a length of 0 (also: "without length"). Besides that, $_ is the default variable. I prefer to see people take advantage of that :)

      Your code will also not work because it splits on either nothing or nothing. Using '' instead of // doesn't make split interpret its first argument as a string. It always sees it as a regex, unless it is ' ' (a single chr(32) space). The | needs to be escaped.

      Oh, and I dislike parens. Without parens, I think things are much easier to read.

      $line = join '|', map { length() ? $_ : '\N' } split /\|/, $line, -1.

      Update 1 - added negative third argument for split per Abigail's suggestion.
      Update 2 - added some parens per Abigail's second suggestion.

      Juerd # { site => 'juerd.nl', plp_site => 'plp.juerd.nl', do_not_use => 'spamtrap' }

        Without parens, I think things are much easier to read.
        And I like parseble code. Without parens, Perl guesses wrong:
        Warning: Use of "length" without parentheses is ambiguous at /tmp/j li +ne 8. Search pattern not terminated at /tmp/j line 8.

        Abigail

        Besides that, $_ is the default variable. I prefer to see people take advantage of that...

        Please note that the advantage in these cases is only in the number of characters that you need to type. And perhaps in readability (which some people might consider a disadvantage in these cases). For execution, there is no difference in the opcode tree generated, and thus no difference in execution. For example:

        $ perl -MO=Deparse -e 'length' length $_;
        Personally, I prefer to use an explicit $_ if I think it improves the readability of my code.

        Liz

      Some problems here. First, split ('|'), a classical mistake. That will split the string, at each and every character, as it's equivalent to split //. You mean split (/\|/).

      Second, split never returns undefined fields. Fields can be empty, but they are still defined. You want to check on length here.

      Third, this loses trailing empty fields. You want to give split a third, negative, argument.

      Abigail