Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi All,

I have a script that creates an XML file from a flatfile (pipe delimited). I have this little snippet in there to remove space and replace || with |0| (the zeros have to be there). There seems to be times that if I have a pipe follwed by more than one space it won't replace with the zero...can anyone see where I'm going wrong here?
$row =~ s#\s+$##; $row =~ s#&+#&amp;#g; $row =~ s#(?<=\|)\|#0|#g;

Thanks!

Replies are listed 'Best First'.
Re: regex not working properly
by JediWizard (Deacon) on Mar 05, 2007 at 21:16 UTC

    $row =~ s#\s+$##; looks to be the problem. the $ means it will only match at the end of the string. if you want to remove all spaces:

    $row =~ s/\s+//g; # Remove only spaces in other wise empty records $row =~ s/\|\s+\|/||/g; # Same but add a zero even for empty items $row =~ s/\|\s*\|/|0|/g;
    (code is untested)

    They say that time changes things, but you actually have to change them yourself.

    —Andy Warhol

      $row =~ s/\|\s+\|/||/g;
      works perfect. Thank you!
        If you ever end up handling a line that looks like this:
        |foo|bar|||baz|
        I think you'll want a slightly more complicated regex -- something like:
        s{ (?<! [^|] ) \s* (?! [^|] ) }{0}gx;
        That uses negative look-behind and look-ahead assertions, so that a string of zero or more spaces will match (and be replaced by "0") if it is neither preceded nor followed by some character other than a pipe symbol. (That is, if the whitespace string is preceded or followed by something other than a pipe symbol, it won't match, and won't be replaced.)

        The phrasing seems a bit obtuse, but the point is that a pipe symbol in line-initial or line-final position should probably cause a zero to be inserted, and when three or more pipes occur in sequence, you probably want zeros between all of them. Your simpler version for removing whitespace between two pipes won't handle those cases very well.

        Personally, I prefer split for this sort of thing:

        $row = join "|", map { s/^\s*$/0/; $_ } split( /\|/, $row, -1 );