aarestad has asked for the wisdom of the Perl Monks concerning the following question:

I have the following pipe-delimited data:

|90|93|foo|bar|91|92|95|96|906|

What I want to do is replace every instance of |9[0-6]| with |X|. Unfortunately, when I specify

s{\|9[0-6]\|}{|X|}g

the above line turns into

|X|93|foo|bar|X|92|X|96|906|

In other words, if I get two in a row, then only the first is replaced. I suppose this is because the pattern matcher picks up on the next character after the replacement, so it wouldn't match. A second pass would take care of the rest, but I'd like a way to do it in one pass, if only for my peace of mind. Any ideas?

Replies are listed 'Best First'.
Re: Replacing consecutive tokens in 1 pass
by Mr. Muskrat (Canon) on Feb 21, 2003 at 17:52 UTC

    You need a zero-width positive look-ahead assertion.
    s{\|9[0-6](?=\|)}{|X}g;

    The (?=\|) tells the regex engine to look for (but not act upon) the vertical bar.

Re: Replacing consecutive tokens in 1 pass
by dws (Chancellor) on Feb 21, 2003 at 18:08 UTC
    An alternate approach is to "pipeline" the process. Split the stream into tokens, operate on the tokens, and then reassemble the tokens. This might be overkill for your particular example, but is still a useful technique to have in your bag.
    my $string = "|90|93|foo|bar|91|92|95|96|906|"; my $result = join "", map { s/^9[0-6]$/X/; $_ } $string =~ m/(\||[^|]+)/g; print $result; __END__ |X|X|foo|bar|X|X|X|X|906|
    It's tempting to use
    split /\|/ $string
    to do the tokenizing, then
    join "|"
    to reassemble them, but you'll lose the trailing "|".

      You can still use the split()/join() approach, if you utilize the LIMIT argument of split().
      my $string = "|90|93|foo|bar|91|92|95|96|906|"; my $result = join '|', map { s/^9[0-6]$/X/; $_ } split /\|/, $string, -1; print $result; __END__ |X|X|foo|bar|X|X|X|X|906|
      ihb
Re: Replacing consecutive tokens in 1 pass
by thelenm (Vicar) on Feb 21, 2003 at 17:54 UTC
    You can use a positive lookahead, which will check for a | character (or the end of the line) after the match, but will not "consume" it. Something like this:
    s/\|9[0-6](?=\||$)/|X/g

    -- Mike

    --
    just,my${.02}

Re: Replacing consecutive tokens in 1 pass
by hv (Prior) on Feb 21, 2003 at 17:55 UTC

    You can avoid consuming the trailing pipe with a lookahead:

    s{ \| 9 [0-6] (?= \| ) # followed by another pipe }{|x}xg;

    Hugo
Re: Replacing consecutive tokens in 1 pass
by aarestad (Sexton) on Feb 21, 2003 at 18:02 UTC
    LOL - 2 different replies in 1 minute, both different. The winner is the second one, though:

    $ cat tmp |90|93|foo|bar|91|92|95|96|906| $ perl -pe 's{\|9[0-6]+?\|}{|X|}g' tmp |X|93|foo|bar|X|92|X|96|X| $ perl -pe 's{\|9[0-6](?=\|)}{|X}g' tmp |X|X|foo|bar|X|X|X|X|906|
    Thanks!

    -peter

Re: Replacing consecutive tokens in 1 pass
by OM_Zen (Scribe) on Feb 21, 2003 at 18:54 UTC
Re: Replacing consecutive tokens in 1 pass
by hardburn (Abbot) on Feb 21, 2003 at 17:42 UTC

    You need to use a non-greedy multiplier:

    s{ \|9 [0-6]+? # Here's the magic part \| } { |X| }xg

    ----
    Reinvent a rounder wheel.

    Note: All code is untested, unless otherwise stated

      That won't work. What he needs is a positive look-ahead assertion, like this:
      use strict; $_ = "|90|93|foo|bar|91|92|95|96|906|"; s{ \|9 [0-6]+ (?=\|) } {|X}xg; print $_,"\n";
      You also can't use 'x' mode to ignore whitespace in the replacement part. I also fixed that.
Re: Replacing consecutive tokens in 1 pass
by Anonymous Monk on Feb 22, 2003 at 19:49 UTC
    Why are we even bothing to check for the pipes? the solution, given the data provided, is as simple as:
    #!/usr/bin/perl -w use strict; my $str = "|90|93|foo|bar|91|92|95|96|906|"; $str =~ s/9([0-9]+)?/x/g; print "$str \n";
      Hi ,

      Your script gives the output |x|x|foo|bar|x|x|x|x|x|

      The post as required needs to retain the 906 and hence , one cannot do a normal "+" search on the digits and also the user string can have a 9[0-9]at any portion of the string like |868969780| , then your scripts turns it to x like this |868x| , and your script is a bit greedy and also changes the pattern to "x" .

      Hence we require to have a look_ahead and look_behind positive assertions(here regular width is ok) to have a pattern match of 9[0-9]following a "|" but not including a "|" and followed by a "|" but not including the "|" as the pattern

      $str =~ s/(?<=\|)9[0-9](?=\|)/X/g;
      This is the extended pattern that shall do it as in my previous post