comment on

1) If you expect this to be the entire field:

$len = length($field);
$field1 = substr($field, 0, int($len/2));
$field2 = substr($field, -int($len/2));
$field = $field1 if ($field1 eq $field2);
[download]

2) Handles spaces better:

$field =~ s/^(.+)\s*\1/$1/;
[download]

3) Handles duplicates anywhere in the field:

$field =~ s/(.{2,})\s*\1/$1/g;
[download]

Update: 4) Handles duplicate anywhere in the field, stops on word boundaries

$field =~ s/\b(.+)\b\s*\1\b/$1/g;
[download]

Test cases for all four follow

sub test1 {
   my $len = length($_[0]);
   my $part1 = substr($_[0], 0, int($len/2));
   my $part2 = substr($_[0], -int($len/2));
   $_[0] = $part1 if ($part1 eq $part2);
}

sub test2 {
   $_[0] =~ s/^(.+)\s*\1/$1/;
}

sub test3 {
   $_[0] =~ s/(.{2,})\s*\1/$1/g;
}

sub test4 {
   $_[0] =~ s/\b(.+)\b\s*\1\b/$1/g;
}

foreach $test (qw( test1 test2 test3 test4 )) {
   print($test, "\n");

   foreach (
      'John SmithJohn Smith',
      'John Smith John Smith',
      'John Smith  John Smith',
      'foo John Smith John Smith bar',
      'John Johnson',
      'foo John Johnson bar',
      'John Smith!John Smith',
   ) {
      my $field = $_;
      &$test($field);
      print($field, "\n");
   }

   print("\n");
}

__END__
output
======
test1
John Smith
John Smith
John Smith  John Smith         <-- case not covererd
foo John Smith John Smith bar  <-- case not covererd
John Johnson
foo John Johnson bar           <-- case not covererd
John Smith                     <-- slightly buggy

test2
John Smith
John Smith
John Smith
foo John Smith John Smith bar  <-- case not covererd
Johnson                        <-- buggy
foo John Johnson bar           <-- case not covererd
John Smith!John Smith

test3
John Smith
John Smith
John Smith
foo John Smith bar
Johnson                        <-- buggy
foo Johnson bar                <-- buggy
John Smith!John Smith

test4
John Smith
John Smith
John Smith
foo John Smith bar
John Johnson
foo John Johnson bar
John Smith!John Smith
[download]

In reply to Re: How do I check a string for dupicate text? by ikegami
in thread How do I check a string for dupicate text? by devgoddess

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.