Finding all substrings

TheHobbit has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.

Re: Finding all substrings
by thelenm (Vicar) on Apr 24, 2002 at 17:54 UTC

sub substrings {
  my $string = shift;
  my @result = ();
  my $strlen = length $string;
  foreach my $length (1..$strlen) {
    foreach my $offset (0..$strlen-$length) {
      push @result, substr($string,$offset,$length);
    }
  }
  return @result;
}
[download]

[reply]
[d/l]

Re: Re: Finding all substrings

by broquaint (Abbot) on Apr 24, 2002 at 18:12 UTC

ever so slightly

use Benchmark qw(cmpthese);

sub substrings_TheHobbit {
  my $string = shift;
  my @result = ();

  foreach my $length (1..length($string)) {
    foreach my $offset (0..length($string)-$length) {
      push @result,substr($string,$offset,$length);
    }
  }
  return @result;
}

sub substrings_thelenm {
  my $string = shift;
  my @result = ();
  my $strlen = length $string;

  foreach my $length (1..$strlen) {
    foreach my $offset (0..$strlen-$length) {
      push @result, substr($string,$offset,$length);
    }
  }
  return @result;
}


cmpthese(-10, {
    TheHobbit   => sub { substrings_TheHobbit("Just Another Perl Hacke
+r,") },
    thelenm     => sub { substrings_thelenm("Just Another Perl Hacker,
+") },
});

__output__

Benchmark: running TheHobbit, thelenm, each for at least 10 CPU second
+s...
 TheHobbit: 12 wallclock secs (10.51 usr +  0.03 sys = 10.54 CPU) @ 80
+4.55/s (n=8480)
   thelenm: 14 wallclock secs (10.44 usr +  0.02 sys = 10.46 CPU) @ 80
+9.37/s (n=8466)
           Rate TheHobbit   thelenm
TheHobbit 805/s        --       -1%
thelenm   809/s        1%        --
[download]

_________ broquaint

[reply]
[d/l]

Re: Re: Re: Finding all substrings

by samtregar (Abbot) on Apr 24, 2002 at 18:26 UTC

-sam

[reply]

Re: Re: Re: Finding all substrings

by erikharrison (Deacon) on Apr 24, 2002 at 18:29 UTC

Some repeated benchmarking answers this . . .

my is expensive. Removing the my and making $strlen a global eliminates the diferrence for the small test case. The bench about the same. Multiplying the length of the test string by ten and leaving $strlen global gives thelenm's sub big gains over the orginal - %6-%10. Finally, making $strlen lexical and retesting reduces those gains by %4-%8, still testing with "Just Another Perl Hacker,"x10.

So, yes, this is definately an optimization.

[reply]
[d/l]
[select]

Re: Re: Finding all substrings

by giulienk (Curate) on Apr 24, 2002 at 18:08 UTC

Update:

suaveant

post

$|=$_='1g2i1u1l2i4e2n0k',map{print"\7",chop;select$,,$,,$,,$_/7}m{..}g

[reply]

Re: Re: Re: Finding all substrings

by suaveant (Parson) on Apr 24, 2002 at 18:13 UTC

- Ant
- Some of my best work - (1 2 3)

[reply]

Re: Finding all substrings
by suaveant (Parson) on Apr 24, 2002 at 18:05 UTC

or you could have better space efficiency by keeping a list of offsets in an array, each index indicating the substring length... like so...

foreach my $length (1..length($string)) {
  foreach my $offset (0..length($string)-$length) {
#    push @result,substr($string,$offset,$length);
     push @{$result[$length]}, $offset;
  }
}
[download]

really there are endless possibilities... so how can you narrow it down?

- Ant
- Some of my best work - (1 2 3)

[reply]
[d/l]

Re: Finding all substrings
by samtregar (Abbot) on Apr 24, 2002 at 18:17 UTC

{
  my @cache; # a cache of substring-finding subs
  sub substrings {
    my $string = shift;
    my $length = length $string;

    # use cached sub if we have one
    return $cache[$length]->($string)
      if exists $cache[$length];

    # create sub to find substrings for this length
    my $sub = 'sub { $_ = shift; return (';
    foreach my $length (1..length($string)) {
      foreach my $offset (0..length($string)-$length) {
        $sub .= "substr(\$_,$offset,$length),";
      }
    }
    $sub .= ")};";
    $cache[$length] = eval $sub;

    # and use it
    return $cache[$length]->($string);
  }
}
[download]

-sam

PS: Has anyone noticed that <code> doesn't deal with hard tabs right? Copy-and-paste from Emacs is unpleasant.

[reply]
[d/l]

Re: Re: Finding all substrings

by samtregar (Abbot) on Apr 24, 2002 at 19:13 UTC

{
  my @cache;
  sub substrings {
    $_ = shift;
    return &{$cache[length($_)]}
      if exists $cache[length($_)];

    my $sub = 'sub { return (';
    foreach my $len (1..length($_)-1) {
      foreach my $off (0..length($_)-$len) {
        $sub .= "substr(\$_,$off,$len),";
      }
    }
    $sub .= "\$_)};";
    $cache[length($_)] = eval $sub;
    return &{$cache[length($_)]};
  }
}
[download]

-sam

[reply]
[d/l]

Re: Finding all substrings

by Dominus (Parson) on Apr 24, 2002 at 19:34 UTC

samtregar

I found this to be 300% faster over 100,000 iterations. Not bad, but I bet we could do better!

Perhaps you meant that yours was three times as fast? If so, it was 66.67% faster, not 300%.

Hope this helps.

--
Mark Dominus
Perl Paraphernalia

[reply]

Re: Re: Finding all substrings

by samtregar (Abbot) on Apr 24, 2002 at 19:39 UTC

            Rate original      new
original  7710/s       --     -80%
new      38610/s     401%       --
[download]

-sam

[reply]
[d/l]

Re: (3) Finding all substrings (Russ: more != less)

by Russ (Deacon) on Apr 24, 2002 at 19:49 UTC

(MeowChow - you're all confused) Re5: Finding all substrings

by MeowChow (Vicar) on Apr 25, 2002 at 00:20 UTC

Re: Re: Finding all substrings

by BUU (Prior) on Apr 25, 2002 at 02:02 UTC

If it took 10 seconds, and was 300% faster, that would merely mean it was 3 seconds long. I think. Actually i think theyre both interchangeable

[reply]

Re: Finding all substrings
by Dominus (Parson) on Apr 24, 2002 at 19:28 UTC

cooler

    sub substrings {
      my @ss;
      $_[0] =~ /.*?(.+?)(?{push @ss, $1})(?!)/;
      @ss;
    }
[download]

--
Mark Dominus
Perl Paraphernalia

[reply]
[d/l]

Re: Re: Finding all substrings

by samtregar (Abbot) on Apr 24, 2002 at 19:36 UTC

-sam

[reply]

Re: Finding all substrings

by Dominus (Parson) on Apr 24, 2002 at 21:17 UTC

samtregar

Um, cool! But it leaks memory at an insane rate on 5.6.1.

first

@ss

{ my @ss;
  sub substrings {
    @ss = ();
    $_[0] =~ /.*?(.+?)(?{push @ss, $1})(?!)/;
    @ss;
  }
}
[download]

--
Mark Dominus
Perl Paraphernalia

[reply]
[d/l]

Re: Re: Finding all substrings

by samtregar (Abbot) on Apr 24, 2002 at 21:28 UTC

Re: Finding all substrings
by erikharrison (Deacon) on Apr 24, 2002 at 18:12 UTC

The root of optimization is finding a solid algorithm, then implementing it, then speeding the implementation. You algorithm is solid (I don't know of a faster one . . .monks?) and your implementation is simple, meaning not much room for speed gain. Offhand though, C style for loops might be slightly faster (perl does an implicit ++ and check when you call the range operator, hence the rationale that doing it directly might be faster).

[reply]

Re: Re: Finding all substrings

by samtregar (Abbot) on Apr 24, 2002 at 18:24 UTC

and then finding a way to exploit caching to make it faster.

-sam

[reply]

Re: Finding all substrings
by sfink (Deacon) on Apr 24, 2002 at 20:35 UTC

sub substrings {
  my $string = shift;

  my @result = ();

  foreach my $start (0..length($string)-1) {
    my $substr = substr($string, $start);
    while (length($substr)) {
        push @result, $substr;
        chop($substr);
    }
  }

  return @result;
}
[download]

sub substrings {
  local $_ = shift;
  my @result;
  do { push @result, /(?=(.+)$)/sg; chop } while (length);
  return @result;
}
[download]

Update: That durn mjd got there before I even submitted this... though he used embedded code. Is there any way without it?

[reply]
[d/l]
[select]