in reply to Re: How do I check a string for dupicate text?
in thread How do I check a string for dupicate text?

s/(.{2,})\s*\1/$1/g

Beware if you use this!

Anybody called 'John Johnson' or 'Jo Jones' will lose their first name.

Replies are listed 'Best First'.
Re^3: How do I check a string for dupicate text?
by ikegami (Patriarch) on Sep 09, 2004 at 17:37 UTC
    Added (4) which fixes this up.
      You should still require 2 words; otherwise poor Johnson Johnson will suffer.

        True. What about people with three names, though (names with 'van', for example)? None of my expressions require two words. I suppose I could make it accept 2+ words. Not hard, just need to tweek the ".+"...

        ah well, it's his own fault or his parent's fault for inflicting such a name on us.

      Oh, hey, that's cool. I already handed the script over, but if there's huge issue with Jo Jones or John Johnson losing their first names, I can just send my client a fixed version of the script.

      He's already run the thing, and it's a huge success. :-)

      Dev Goddess
      Developer / Analyst / Criminal Mastermind

      "Size doesn't matter. It's all about speed and performance."

      $field =~ s/\b(.+)\b\s*\1\b/$1/g;

      Beware! Depending on what else there is in the string after apart from the (possibly repeated) firstname + lastname, this regex can be highly dangerous.

      Consider:

      my $field = "Jo Doe (Tel: 999-111-111)"; $field =~ s/\b(.+)\b\s*\1\b/$1/g; print $field;

      Oops! What's happened to Joe Doe's phone number??

        aye, #3 and #4 remove duplicates bits anywhere in the string, as stated in my original reply.