How do I check a string for dupicate text?

devgoddess has asked for the wisdom of the Perl Monks concerning the following question:

I have what I feel is the stupidest question. I'm working with name fields in a flat file. Most of them are fine, but some have double names. For example, a field may have "John Smith John Smith". I need to take the first 2 words of the string, "John Smith" (or first and last name), and check the rest of the string for another occurence of that name. Then I need to chop off the 2nd instance of the name.

How do I do that? Thanks in advance, and please have mercy on me. I really can't figure this out. I'm sure that, as usual, it's something simple and stupid I'm overlooking.

Dev Goddess
Developer / Analyst / Criminal Mastermind

"Size doesn't matter. It's all about speed and performance."

Comment on How do I check a string for dupicate text?

Replies are listed 'Best First'.
Re: How do I check a string for dupicate text? by ikegami (Patriarch) on Sep 09, 2004 at 16:44 UTC
1) If you expect this to be the entire field: `$len = length($field); $field1 = substr($field, 0, int($len/2)); $field2 = substr($field, -int($len/2)); $field = $field1 if ($field1 eq $field2);` [download] 2) Handles spaces better: `$field =~ s/^(.+)\s\1/$1/;` [download] 3) Handles duplicates anywhere in the field: `$field =~ s/(.{2,})\s\1/$1/g;` [download] Update: 4) Handles duplicate anywhere in the field, stops on word boundaries `$field =~ s/\b(.+)\b\s*\1\b/$1/g;` [download] Test cases for all four follow Read more... (2 kB)	[reply] [d/l] [select]
Re^2: How do I check a string for dupicate text? by Not_a_Number (Prior) on Sep 09, 2004 at 17:23 UTC
`s/(.{2,})\s*\1/$1/g` Beware if you use this! Anybody called 'John Johnson' or 'Jo Jones' will lose their first name.	[reply] [d/l]
Re^3: How do I check a string for dupicate text? by ikegami (Patriarch) on Sep 09, 2004 at 17:37 UTC
Added (4) which fixes this up.	[reply]
Re^4: How do I check a string for dupicate text? by ysth (Canon) on Sep 09, 2004 at 19:06 UTC
Re^5: How do I check a string for dupicate text? by ikegami (Patriarch) on Sep 09, 2004 at 19:52 UTC
Some notes below your chosen depth have not been shown here
Re^4: How do I check a string for dupicate text? by devgoddess (Acolyte) on Sep 09, 2004 at 18:33 UTC
Re^4: How do I check a string for dupicate text? by Not_a_Number (Prior) on Sep 11, 2004 at 19:02 UTC
Re^5: How do I check a string for dupicate text? by ikegami (Patriarch) on Sep 11, 2004 at 19:48 UTC
Re^2: How do I check a string for dupicate text? by devgoddess (Acolyte) on Sep 09, 2004 at 16:53 UTC
OMG! Holy crap! That last line worked perfectly. You're my new hero! LOL Dev Goddess Developer / Analyst / Criminal Mastermind "Size doesn't matter. It's all about speed and performance."	[reply]
Re: How do I check a string for dupicate text? by Anonymous Monk on Sep 10, 2004 at 09:46 UTC
`s/^ # Start of string ( # Start remembering \W* # Leading non word characters (\w+) # First word of string \s+ # Whitespace (\w+) # Second word of string \b # Make sure we got the entire word .*? # Skip till second occurance ) # Stop remembering \b # Start at the beginning of a word \2 # Repeat of the first word \s+ # Whitespace \3 # Repeat of the second word \b # And that's the end of the word /$1/sx; # Just keep what we remembered` [download]	[reply] [d/l]

Beware if you use this!