1) If you expect this to be the entire field:

$len = length($field); $field1 = substr($field, 0, int($len/2)); $field2 = substr($field, -int($len/2)); $field = $field1 if ($field1 eq $field2);

2) Handles spaces better:

$field =~ s/^(.+)\s*\1/$1/;

3) Handles duplicates anywhere in the field:

$field =~ s/(.{2,})\s*\1/$1/g;

Update: 4) Handles duplicate anywhere in the field, stops on word boundaries

$field =~ s/\b(.+)\b\s*\1\b/$1/g;
Test cases for all four follow
sub test1 { my $len = length($_[0]); my $part1 = substr($_[0], 0, int($len/2)); my $part2 = substr($_[0], -int($len/2)); $_[0] = $part1 if ($part1 eq $part2); } sub test2 { $_[0] =~ s/^(.+)\s*\1/$1/; } sub test3 { $_[0] =~ s/(.{2,})\s*\1/$1/g; } sub test4 { $_[0] =~ s/\b(.+)\b\s*\1\b/$1/g; } foreach $test (qw( test1 test2 test3 test4 )) { print($test, "\n"); foreach ( 'John SmithJohn Smith', 'John Smith John Smith', 'John Smith John Smith', 'foo John Smith John Smith bar', 'John Johnson', 'foo John Johnson bar', 'John Smith!John Smith', ) { my $field = $_; &$test($field); print($field, "\n"); } print("\n"); } __END__ output ====== test1 John Smith John Smith John Smith John Smith <-- case not covererd foo John Smith John Smith bar <-- case not covererd John Johnson foo John Johnson bar <-- case not covererd John Smith <-- slightly buggy test2 John Smith John Smith John Smith foo John Smith John Smith bar <-- case not covererd Johnson <-- buggy foo John Johnson bar <-- case not covererd John Smith!John Smith test3 John Smith John Smith John Smith foo John Smith bar Johnson <-- buggy foo Johnson bar <-- buggy John Smith!John Smith test4 John Smith John Smith John Smith foo John Smith bar John Johnson foo John Johnson bar John Smith!John Smith

In reply to Re: How do I check a string for dupicate text? by ikegami
in thread How do I check a string for dupicate text? by devgoddess

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.