in reply to RE question...yup, another one ;)

Let me point out a couple notes on using regexes to solve this problem; first, though, using modulus is definitely the way to go (it averages between 3 and 2.5 times faster than regex approaches). It is important, when given a language such as Perl, to know which tools are hammers, screwdrivers, and wrenches; and even moreso, which problems are nails, screws, and bolts. This is a case where you are using a sledgehammer (a regex) to flatten a piece of wood, when a simple piece of sandpaper (the % operator) will do.

There are four generic approaches to this:

/(\d)*(\d)/
useless capturing of preceeding digits; actually captures one digit at a time over and over again; useless backtracking is forced; 2.96 x slower than modulus

/(\d*)(\d)/
useless capturing of preceeding digits; useless backtracking is forced; 2.80 x slower than modulus

/\d*(\d)/
useless backtracking is forced; 2.79 x slower than modulus

/(\d)$/
optimized (goes to end of string automatically); 2.54 x slower than modulus
Code used for benchmarks:
use Benchmark 'timethese'; $x = int (1_000_000 * rand 1_000_000); timethese(-5, { multiple => sub { $x =~ /(\d)*(\d)/ }, backtrack_c => sub { $x =~ /(\d*)(\d)/ }, backtrack => sub { $x =~ /\d*(\d)/ }, opt => sub { $x =~ /(\d)$/ }, mod => sub { $x % 10 }, });
If snafu doesn't mind, I'd like to use this as an example in my book -- first to show how to craft a regex, then to show why there are some places where a regex is overkill.

japhy -- Perl and Regex Hacker

Replies are listed 'Best First'.
(tye)Re: japhy regex analysis: case study (RE question...)
by tye (Sage) on May 28, 2001 at 02:42 UTC

    Well, since you resorted to benchmarks (updated)...

    Rate bt_c mult bt opt mod chop bt_c 180870/s -- -1% -5% -28% -62% -74% mult 181987/s 1% -- -4% -27% -62% -74% bt 189426/s 5% 4% -- -24% -60% -73% opt 249612/s 38% 37% 32% -- -48% -64% mod 476214/s 163% 162% 151% 91% -- -31% chop 692944/s 283% 281% 266% 178% 46% --
    So chop is over 45% faster than mod even though it had to make an extra copy of the string!
    use Benchmark 'cmpthese'; $x = int (1_000_000 * rand 1_000_000); cmpthese( -3, { mult => sub { $x =~ /(\d)*(\d)/ }, bt_c => sub { $x =~ /(\d*)(\d)/ }, bt => sub { $x =~ /\d*(\d)/ }, opt => sub { $x =~ /(\d)$/ }, mod => sub { $x % 10 }, chop => sub { my $x= $x; chop $x }, });


    Following are the original bogus results. Thanks to dkubb for mentioning my over local. I realized I'd made a mistake and came back but not quick enough. So it looks like local is quite a bit slower than my (which makes sense), so I'd be interested in how japhy's machine compares the new code.

    Rate mult bt_c bt opt mod chop mult 180759/s -- -1% -7% -31% -62% -71% bt_c 182581/s 1% -- -6% -31% -62% -70% bt 193680/s 7% 6% -- -26% -60% -69% opt 263234/s 46% 44% 36% -- -45% -57% mod 481067/s 166% 163% 148% 83% -- -22% chop 618559/s 242% 239% 219% 135% 29% --
    So chop is almost 30% faster than mod even though it had to make an extra copy of the string!
    use Benchmark 'cmpthese'; $x = int (1_000_000 * rand 1_000_000); cmpthese( -3, { mult => sub { $x =~ /(\d)*(\d)/ }, bt_c => sub { $x =~ /(\d*)(\d)/ }, bt => sub { $x =~ /\d*(\d)/ }, opt => sub { $x =~ /(\d)$/ }, mod => sub { $x % 10 }, chop => sub { local $x; chop $x }, });

            - tye (but my friends call me "Tye")
      Even on a different machine, I get these results (the machine has the 5.005 version of Benchmark.pm):
      timethese( -3, { mod => sub { $x % 10 }, chop => sub { chop(my $x = $x) }, substr => sub { substr($x, -1) }, }); __END__ (they ran for at least 3 seconds) chop: 8707.14/s (n= 29256) substr: 43620.45/s (n=136532) mod: 48906.87/s (n=163838)
      So on my machine, mod is much faster than chop; the unexplored substr approach is nearly as fast.

      japhy -- Perl and Regex Hacker

        The main difference probably isn't your computer vs. my computer. You left out the regex solutions which (the first time that the first one is called) "modify" the global $x by providing it with a string value.

        Since the chop solution makes a copy in order to avoid changing the value of the global $x, it also doesn't give the global $x a string value. So if the chop solution is run first, it has to stringify $x for every single call (as in your two runs but not in mine).

        Your substr solution also gives the global $x a string value, but it is getting run after the chop solution so it doesn't help (in my run, all of the regex solutions were run before the chop solution).

        So for another way to compare the regular expression versions to the mod version, you could force a stringification per call. I didn't come up with an eligant way to do this (and I came up with some pretty interesting but non-intuitive and mutually contradictory benchmark numbers so I'll just leave this to someone else).

                - tye (but my friends call me "Tye")
        Geez! I could kick myself for not thinking of substr! >:\

        ----------
        - Jim

      Except that you never actually stored anything in local $x.
      use Benchmark 'cmpthese'; $x = int (1_000_000 * rand 1_000_000); cmpthese( -3, { mod => sub { $x % 10 }, chop => sub { local $x; chop $x }, chop2 => sub { chop(local $x = $x) }, }); __END__ Rate chop2 chop mod chop2 25430/s -- -88% -93% chop 204396/s 704% -- -41% mod 348051/s 1269% 70% --
      On my machine, chop() was slower. But the real chop() approach was slower still.

      japhy -- Perl and Regex Hacker
Re: japhy regex analysis: case study (RE question...)
by snafu (Chaplain) on May 30, 2001 at 01:42 UTC
    Japhy, Sure!! I would love it to be in your book. And I appreciate the thorough answer. I admit, the one thing I never even considered was the analogous nuts and bolts, hammers and wrenches of Perl.

    Thanks!

    ----------
    - Jim