cruelty has asked for the wisdom of the Perl Monks concerning the following question:

Hi
I'm looking for a clever regex that could remove duplicates from a string. I came up with some solutions, but I'm too ashamed to show these here ;-)


If I have a scalar string like:

"foo bar donkey foo grinch bar\nfoo donkey"

It should be parsed to :

"foo bar donkey grinch\n"

Thanks,

Cr\/eLmOnK.

Replies are listed 'Best First'.
Re: remove duplicates
by blakem (Monsignor) on Oct 11, 2002 at 07:56 UTC
    How about:
    1 while $text =~ s/(\b(\w+)\b.*)\b\2\b\s?/$1/s;

    -Blake

      absolutely brilliant !

      Thanks.
Re: remove duplicates
by joe++ (Friar) on Oct 11, 2002 at 08:05 UTC
    Maybe you should be more explicit with your requirements. For instance, do you want to keep the order of the (first-) occurrences of each word?

    If not, doing a split on whitespace, using each list element as a hash key and flattening the result with keys() will give you the list of unique words.

    my $string = "foo bar ..."; my %L; map($L{$_}++, split(/\s/, $string)); # bonus: each hash element has the number of occurrences as value. $string = join(' ', keys(%L));
    HTH!
    note: this is untested...

    --
    Cheers, Joe

      If you're going to split and use a hash to check for uniqueness, its still possible to retain the original order (first occurances only, of course)
      my %seen; $text = join(' ', grep !$seen{$_}++, split(' ',$text));
      Though, this removes all the newlines as well....

      -Blake

Re: remove duplicates
by Abigail-II (Bishop) on Oct 11, 2002 at 11:02 UTC
    Could you be a bit more specific? From your example, it's clear you want to remove duplicate "word"s, but to me it's not clear which whitespace should be kept. There's a newline between bar and foo, and you want to keep the newline, even while you want the bar and foo removed. (All code given so far in this thread fail to do so).

    What are your requirements for whitespace retention? A single example is just too vague.

    Abigail

Re: remove duplicates
by mce (Curate) on Oct 11, 2002 at 12:26 UTC
    Hi,

    This question pops up once and a while. It is mostly referred to as:"how to find unique elements in an array", as you can easily split you string on " "'s.
    I was wondering, why isn't there a unique method in perl?
    This seems quite usefull.
    Anyway, I defined mostly a UNIVERSAL method like this

    sub unique { my $self=shift; my %tmp; map { %tmp{$_}=1 } @_; return keys %tmp; }
    and just call it like :
    $o->unique(@array); # with $o my blessed object
    But, anyway, since blakem and joe++ gave perfectly good answers, I rest my case :-)
    ---------------------------
    Dr. Mark Ceulemans
    Senior Consultant
    IT Masters, Belgium