Re: most efficient regex to delete duplicate words (boo)

Here's one way.

$_="alpha beta beta gamma gamma gamma";
while (s/((\w+)\s\2)/$2/) {};
print $_;
[download]

the second line does something that may not be obvious to everyone, and seems to duplicate /g 's functionality. However, since you've (seemingly) got 3 gamma in a row, writing

$_="alpha beta beta gamma gamma gamma";
s/((\w+)\s\2)/$2/g;
print $_;
[download]

Will leave you with an extra gamma. Using the 'useless' while loop allows the regex to check for multiple duplicates.

As for the regex you tried $string =~ s/(\w+)(.*)\b\1/$1 $2/sig;, we have :

one or more word characters (\w+)
zero or more characters of any class (.*)
a word boundry
the result of the first match

Replies are listed 'Best First'.
Re: Re: most efficient regex to delete duplicate words (boo) by blakem (Monsignor) on Aug 14, 2001 at 02:37 UTC
I always perfer: `1 while (EXPR)` [download] instead of: `while (EXPR) {}` [download] Simply because the '1 while' sticks out at the front where as the empty {} tends to get lost. I find that '1 while' immediately flags this perl idiom and makes it easier for me to pick it out. -Blake	[reply] [d/l] [select]
Re: Re: Re: most efficient regex to delete duplicate words (boo) by Ven'Tatsu (Deacon) on Aug 14, 2001 at 07:32 UTC
`1 while (EXPR):` will also avoid the slight over head of entering and exiting the lexical context created by the BLOCK in `while (EXPR) {}` Perl might be smart enough to optimise the out the empty block though.	[reply] [d/l] [select]
Re: Re: Re: most efficient regex to delete duplicate words (boo) by mugwumpjism (Hermit) on Aug 14, 2001 at 03:50 UTC
What's wrong with: `"chill" while (EXPR);` ???	[reply] [d/l]