in reply to Broken regexp

How about this:
$line=join('|',map(defined($_) ? $_ : '\N', split('|',$line)));
(Split on the separator, replace undefined fields with \N, glue it back together again.

Replies are listed 'Best First'.
Re: Re: Broken regexp
by Juerd (Abbot) on Mar 09, 2004 at 13:26 UTC

    $line=join('|',map(defined($_) ? $_ : '\N', split('|',$line)));'

    Empty fields will be defined, but empty. In other words: they will be strings with a length of 0 (also: "without length"). Besides that, $_ is the default variable. I prefer to see people take advantage of that :)

    Your code will also not work because it splits on either nothing or nothing. Using '' instead of // doesn't make split interpret its first argument as a string. It always sees it as a regex, unless it is ' ' (a single chr(32) space). The | needs to be escaped.

    Oh, and I dislike parens. Without parens, I think things are much easier to read.

    $line = join '|', map { length() ? $_ : '\N' } split /\|/, $line, -1.

    Update 1 - added negative third argument for split per Abigail's suggestion.
    Update 2 - added some parens per Abigail's second suggestion.

    Juerd # { site => 'juerd.nl', plp_site => 'plp.juerd.nl', do_not_use => 'spamtrap' }

      Without parens, I think things are much easier to read.
      And I like parseble code. Without parens, Perl guesses wrong:
      Warning: Use of "length" without parentheses is ambiguous at /tmp/j li +ne 8. Search pattern not terminated at /tmp/j line 8.

      Abigail

      Besides that, $_ is the default variable. I prefer to see people take advantage of that...

      Please note that the advantage in these cases is only in the number of characters that you need to type. And perhaps in readability (which some people might consider a disadvantage in these cases). For execution, there is no difference in the opcode tree generated, and thus no difference in execution. For example:

      $ perl -MO=Deparse -e 'length' length $_;
      Personally, I prefer to use an explicit $_ if I think it improves the readability of my code.

      Liz

        Please note that the advantage in these cases is only in the number of characters that you need to type.

        Reducing typing has always been its purpose and that is what it does well. I hate that you have to use parens here and would in practice probably indeed use length $_ instead of length(). (I find the latter rather unreadable the more I think about it. It looks as if I'm trying to explicitly pass NO arguments.)

        I prefer to leave out $_ when it can be implied, because that is what I think improves readability. I even like to abuse for for topicalization.

        Juerd # { site => 'juerd.nl', plp_site => 'plp.juerd.nl', do_not_use => 'spamtrap' }

        This is true of most ops (e.g. "length($_)" compiles exactly the same as "length()").

        The exceptions are m//, s///, and tr///, where leaving out the implicit "$_ =~" will produce a leaner opcode tree that will be only slightly faster.

Re: Broken regexp
by Abigail-II (Bishop) on Mar 09, 2004 at 13:29 UTC
    Some problems here. First, split ('|'), a classical mistake. That will split the string, at each and every character, as it's equivalent to split //. You mean split (/\|/).

    Second, split never returns undefined fields. Fields can be empty, but they are still defined. You want to check on length here.

    Third, this loses trailing empty fields. You want to give split a third, negative, argument.

    Abigail