Replacing consecutive tokens in 1 pass

aarestad has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Replacing consecutive tokens in 1 pass by Mr. Muskrat (Canon) on Feb 21, 2003 at 17:52 UTC
You need a zero-width positive look-ahead assertion. `s{\\|9[0-6](?=\\|)}{\|X}g;` The `(?=\\|)` tells the regex engine to look for (but not act upon) the vertical bar.	[reply] [d/l] [select]
Re: Replacing consecutive tokens in 1 pass by dws (Chancellor) on Feb 21, 2003 at 18:08 UTC
An alternate approach is to "pipeline" the process. Split the stream into tokens, operate on the tokens, and then reassemble the tokens. This might be overkill for your particular example, but is still a useful technique to have in your bag. `my $string = "\|90\|93\|foo\|bar\|91\|92\|95\|96\|906\|"; my $result = join "", map { s/^9[0-6]$/X/; $_ } $string =~ m/(\\|\|[^\|]+)/g; print $result; __END__ \|X\|X\|foo\|bar\|X\|X\|X\|X\|906\|` [download] It's tempting to use `split /\\|/ $string` [download] to do the tokenizing, then `join "\|"` [download] to reassemble them, but you'll lose the trailing "\|".	[reply] [d/l] [select]
Re: Re: Replacing consecutive tokens in 1 pass by ihb (Deacon) on Feb 24, 2003 at 13:10 UTC
You can still use the `split()`/`join()` approach, if you utilize the LIMIT argument of `split()`. `my $string = "\|90\|93\|foo\|bar\|91\|92\|95\|96\|906\|"; my $result = join '\|', map { s/^9[0-6]$/X/; $_ } split /\\|/, $string, -1; print $result; __END__ \|X\|X\|foo\|bar\|X\|X\|X\|X\|906\|` [download] `ihb`	[reply] [d/l] [select]
Re: Replacing consecutive tokens in 1 pass by thelenm (Vicar) on Feb 21, 2003 at 17:54 UTC
You can use a positive lookahead, which will check for a \| character (or the end of the line) after the match, but will not "consume" it. Something like this: `s/\\|9[0-6](?=\\|\|$)/\|X/g` [download] -- Mike `-- just,my${.02}`	[reply] [d/l]
Re: Replacing consecutive tokens in 1 pass by hv (Prior) on Feb 21, 2003 at 17:55 UTC
You can avoid consuming the trailing pipe with a lookahead: `s{ \\| 9 [0-6] (?= \\| ) # followed by another pipe }{\|x}xg;` [download] Hugo	[reply] [d/l]
Re: Replacing consecutive tokens in 1 pass by aarestad (Sexton) on Feb 21, 2003 at 18:02 UTC
LOL - 2 different replies in 1 minute, both different. The winner is the second one, though: `$ cat tmp \|90\|93\|foo\|bar\|91\|92\|95\|96\|906\| $ perl -pe 's{\\|9[0-6]+?\\|}{\|X\|}g' tmp \|X\|93\|foo\|bar\|X\|92\|X\|96\|X\| $ perl -pe 's{\\|9[0-6](?=\\|)}{\|X}g' tmp \|X\|X\|foo\|bar\|X\|X\|X\|X\|906\|` [download] Thanks! -peter	[reply] [d/l]
Re: Replacing consecutive tokens in 1 pass by OM_Zen (Scribe) on Feb 21, 2003 at 18:54 UTC
Hi , `my $str = "\|90\|93\|foo\|bar\|91\|92\|95\|96\|906\|"; $str =~ s/(?<=\\|)9[0-6](?=\\|)/X/g; print "[ $str ]\n"; __END__ Also , you can go through this tutorial` [download] Extended Patterns regular expressions	[reply] [d/l]
Re: Replacing consecutive tokens in 1 pass by hardburn (Abbot) on Feb 21, 2003 at 17:42 UTC
You need to use a non-greedy multiplier: `s{ \\|9 [0-6]+? # Here's the magic part \\| } { \|X\| }xg` [download] ---- Reinvent a rounder wheel. Note: All code is untested, unless otherwise stated	[reply] [d/l]
Re: Re: Replacing consecutive tokens in 1 pass by tall_man (Parson) on Feb 21, 2003 at 17:57 UTC
That won't work. What he needs is a positive look-ahead assertion, like this: `use strict; $_ = "\|90\|93\|foo\|bar\|91\|92\|95\|96\|906\|"; s{ \\|9 [0-6]+ (?=\\|) } {\|X}xg; print $_,"\n";` [download] You also can't use 'x' mode to ignore whitespace in the replacement part. I also fixed that.	[reply] [d/l]
Re: Replacing consecutive tokens in 1 pass by Anonymous Monk on Feb 22, 2003 at 19:49 UTC
Why are we even bothing to check for the pipes? the solution, given the data provided, is as simple as: `#!/usr/bin/perl -w use strict; my $str = "\|90\|93\|foo\|bar\|91\|92\|95\|96\|906\|"; $str =~ s/9([0-9]+)?/x/g; print "$str \n";` [download]	[reply] [d/l]
Re: Re: Replacing consecutive tokens in 1 pass by OM_Zen (Scribe) on Feb 22, 2003 at 23:24 UTC
Hi , Your script gives the output \|x\|x\|foo\|bar\|x\|x\|x\|x\|x\| The post as required needs to retain the 906 and hence , one cannot do a normal "+" search on the digits and also the user string can have a `9[0-9]`at any portion of the string like \|868969780\| , then your scripts turns it to x like this \|868x\| , and your script is a bit greedy and also changes the pattern to "x" . Hence we require to have a look_ahead and look_behind positive assertions(here regular width is ok) to have a pattern match of `9[0-9]`following a "\|" but not including a "\|" and followed by a "\|" but not including the "\|" as the pattern `$str =~ s/(?<=\\|)9[0-9](?=\\|)/X/g;` [download] This is the extended pattern that shall do it as in my previous post	[reply] [d/l] [select]