Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Re: finding longest common substring

by CombatSquirrel (Hermit)
on Nov 19, 2003 at 22:38 UTC ( [id://308432]=note: print w/replies, xml ) Need Help??


in reply to finding longest common substring

I personally like your idea and implementation. They are simple and (appearantly) efficient. You might want to have a look at index instead of the RegEx you are using, especially because you are escaping metacharacters, but that's just a minor issue. Anyways, I tried and benchmarked it; if you are interested in the results, here they are:
For starters: just a simple match:
#!perl -w use strict; use Benchmark qw/:all/; my $result = timethese( -5, { index => sub { -1 != index 'fooabc123', '123' }, regex => sub { 'fooabc123' =~ /123/ } } ); cmpthese $result;
Results in
Benchmark: running index, regex for at least 5 CPU seconds... index: 6 wallclock secs ( 6.56 usr + -0.03 sys = 6.53 CPU) @ 81 +19742.77/s (n=53030040) regex: 5 wallclock secs ( 5.50 usr + 0.00 sys = 5.50 CPU) @ 44 +81230.91/s (n=24646770) Rate regex index regex 4481231/s -- -45% index 8119743/s 81% --

And the complete sub:
#!/usr/bin/perl use warnings; use strict; use Data::Dumper; use Benchmark qw/:all/; sub LCSRegEx { my $substr = $_[0]; my $len = length $_[0]; my $off = 0; while ($substr) { my @matches = grep /\Q$substr/, @_; last if @matches == @_; $off++; $len-- and $off=0 if $off+$len > length $_[0]; $substr = substr $_[0], $off, $len; } return $substr; } sub LCSIndex { my $substr = $_[0]; my $len = length $_[0]; my $off = 0; while ($substr) { my @matches = grep { -1 != index $_, $substr } @_; last if @matches == @_; $off++; $len-- and $off=0 if $off+$len > length $_[0]; $substr = substr $_[0], $off, $len; } return $substr; } my $result = timethese( -5, { LCSRegEx => sub { for ([ qw(fooabc123 fooabc321 foobca232) ], [ qw(abcfoo123 bcafoo321 foo123abc) ], [ qw(foo bor boz bzo) ]) { LCSRegEx(@{$_}); } }, LCSIndex => sub { for ([ qw(fooabc123 fooabc321 foobca232) ], [ qw(abcfoo123 bcafoo321 foo123abc) ], [ qw(foo bor boz bzo) ]) { LCSIndex(@{$_}); } } }); cmpthese $result;
Results in
Benchmark: running LCSIndex, LCSRegEx for at least 5 CPU seconds... LCSIndex: 6 wallclock secs ( 5.28 usr + 0.00 sys = 5.28 CPU) @ 34 +63.93/s (n =18293) LCSRegEx: 6 wallclock secs ( 5.50 usr + 0.00 sys = 5.50 CPU) @ 91 +6.36/s (n= 5040) Rate LCSRegEx LCSIndex LCSRegEx 916/s -- -74% LCSIndex 3464/s 278% --
++ for you
Cheers,
CombatSquirrel.
Entropy is the tendency of everything going to hell.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://308432]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having an uproarious good time at the Monastery: (7)
As of 2024-03-28 10:01 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found