Re: Unique-Character Substring

This can actually be done in O(N):

sub tony_UCS {
  local $_ = reverse shift;
  my $longest = my $tohere = "";
  while (my $char = chop $_) {
    my $idx = index($tohere, $char);
    if ($idx != -1) {
      $longest = $tohere if (length $tohere > length $longest);
      $tohere = substr($tohere, $idx + 1) . $char;
    } else {
      $tohere .= $char;
    }
  }
  $longest = $tohere if (length $tohere > length $longest);
  return $longest;
}
[download]

This is about 3 times faster on a short test string, and quickly goes up to 5 times faster for slightly longer strings...

Tony

Comment on Re: Unique-Character Substring Download Code

Replies are listed 'Best First'.
Re (tilly) 1: Unique-Character Substring by tilly (Archbishop) on Jan 20, 2001 at 21:22 UTC
My turn to offer corrections. :-) First of all index is a O(n) operation. If you build a hash first and use that it is still logarithmic (not sure if study does that) but you lose. Also if there are multiple substrings of the maximum length you return only one of them. Try the string "hello hello" out. Jeff finds "lo he" and "o hel". You only find "lo he".	[reply]
Re: Re (tilly) 1: Unique-Character Substring by salvadors (Pilgrim) on Jan 20, 2001 at 21:40 UTC
Yes, of course. I'd totally missed the point of returning an array! This will solve that part: `sub tony_UCS { local $_ = reverse shift; my %longest; my $longest = 0; my $tohere = ""; while (my $char = chop $_) { my $idx = index($tohere, $char); if ($idx != -1) { push @{$longest{$longest = length $tohere}}, $tohere if (length $tohere >= $longest); $tohere = substr($tohere, $idx + 1) . $char; } else { $tohere .= $char; } } push @{$longest{$longest = length $tohere}}, $tohere if (length $tohere >= $longest); return @{$longest{$longest}}; }` [download] It's still considerably faster than the original, even if not O(n). :) Tony	[reply] [d/l]
Re: Re: Unique-Character Substring by japhy (Canon) on Jan 20, 2001 at 20:07 UTC
Wow... very nice. I see exactly what you did. You reversed the domain of the `index()` -- instead of seeing where 'x' occurs next in the string, you see if it's already occurred in the substring. Damn. Nice. `japhy` -- Perl and Regex Hacker	[reply]