Recognizing duplicates

b4swine has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Recognizing duplicates by rhesa (Vicar) on Oct 19, 2007 at 12:52 UTC
To quote from perlre: The bracketing construct `( ... )` creates capture buffers. To refer to the digit'th buffer use `\<digit>` within the match. Outside the match use "$" instead of "\". (The `\<digit>` notation works in certain circumstances outside the match. See the warning below about `\1` vs `$1` for details.) Referring back to another part of the match is called a backreference In other words, use `print 'dup' if /(.).*\1/;` [download]	[reply] [d/l] [select]
Re: Recognizing duplicates by Fletch (Bishop) on Oct 19, 2007 at 12:53 UTC
The numeric variables (`$1` etc.) are only set on the right hand side of a substitution. You need to use the corresponding backreference (e.g. `\1`) on the LHS of a `s///` or in a `m//`. See perlretut and perlre, the former of which uses this exact problem as its example.	[reply] [d/l] [select]
Re: Recognizing duplicates by perlfan (Parson) on Oct 19, 2007 at 13:45 UTC
Can't you use a hash? `my $string = "aabbbcc"; my @array = split('',$string); my %hash = (); foreach (@array) { print "dup found!\n" if (exists($hash{$_})); $hash{$_}++; }` [download]	[reply] [d/l]
Re: Recognizing duplicates by Anonymous Monk on Oct 19, 2007 at 14:55 UTC
note that the dot metacharacter matches any character (except newline, unless the `/s` regex switch is used). if, instead, you want to check for the duplication of any letter, i.e., an alpha character, you might try `/(\w).\1/` or `/([a-zA-Z]).\1/`.	[reply] [d/l] [select]
Re: Recognizing duplicates by RaduH (Scribe) on Oct 19, 2007 at 21:29 UTC
I'd use this function: index(STRING, SUBSTRING, POSITION) -- Returns the position of the first occurrence of SUBSTRING in STRING at or after POSITION. If you don't specify POSITION, the search starts at the beginning of STRING You'll be looking for the first occurrence of the CURRENT character in the string AFTER the current position. Any way you look at it, it is O(n^2) on the length of your string. At least this solution doesn't use additional memory like the solution with the hash.	[reply]
Re^2: Recognizing duplicates by Sartak (Hermit) on Oct 20, 2007 at 05:42 UTC
Actually, the hash based solution is roughly O(length) time. It's (approximately) a constant amount of time to insert/index into the hash, and you only iterate over the string once.	[reply]