in reply to Determining uniqueness in a string.

If you're gonna get this nuts about speed, you may as well fire up Inline C or XS. If you know XS, this solution is actually easier to grok than some of the more esoteric solutions above, as well as faster:

!/usr/bin/perl use strict; use warnings; # fast_uniq.plx -- 10 unique digits in a 10-char string use Benchmark qw( cmpthese ); use Inline C => <<'END_C'; /* works for ASCII, EBCDIC, UTF-8 */ int offset; SV* digits_sv; void prime_variables (SV* input_sv, SV* zero_sv) { digits_sv = newSVsv(input_sv); char* zero_str = SvPV_nolen(zero_sv); offset = *zero_str; } int test_uniq_digits () { STRLEN digits_len = SvCUR( digits_sv ); char* string = SvPV( digits_sv, digits_len ); char test_buf[10] = { 0,0,0,0,0, 0,0,0,0,0 }; int i, index; for (i = 0; i < 10; i++) { index = *string - offset; if (index < 0 || index > 9) croak("illegal character: '%c'", *string); if (test_buf[index] == 1) return 0; test_buf[index] = 1; string++; } return 1; } END_C run_test("0123456789", "UNIQUE:\n"); run_test("1123456789", "NOT_UNIQUE:\n"); sub run_test { my ($digits, $message) = @_; prime_variables($digits, "0"); print $message; cmpthese( -5, { u1 => sub { for ( 0 .. 9 ) { return 0 if $digits !~ /$_/; } return 1; }, u1000 => sub { test_uniq_digits }, }); print "\n\n"; }

Here's the output:

UNIQUE: Rate u1 u1000 u1 10619/s -- -98% u1000 692527/s 6421% -- NOT_UNIQUE: Rate u1 u1000 u1 414417/s -- -42% u1000 712019/s 72% --
--
Marvin Humphrey
Rectangular Research ― http://www.rectangular.com

Replies are listed 'Best First'.
Re^2: Determining uniqueness in a string.
by BrowserUk (Patriarch) on Oct 04, 2005 at 00:03 UTC

    Unless your XS version is quicker than my Inline C algorithm:

    __DATA__ __C__ int uniq( char* s ) { int i; int t = 0; for( i=0; i<10; i++ ) { t |= 1 << ( s[i] - 48 ); } return t == 1023; }

    which doesn't seem likely, then the big hash lookup will still be faster,


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
    "Science is about questioning the status quo. Questioning authority".
    The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.
      With the caveat that your algo needs a variable offset or it will fail on EBCDIC machines, I like it better than mine. Your first and your last solutions are the best two on the page, IMHO.

      WRT the giant hash lookup: "That's nice, honey. Did you remember to feed the dog?" ;)

      --
      Marvin Humphrey
      Rectangular Research ― http://www.rectangular.com

        The big hash takes very little time to load from a pre-generated file of the 10! permutations--it doesn't actually take that long to generate either--and the 350 MB loads easily within my 512 MB ram machine without swapping (if I haven't got much else running at the time).

        If raw performance is the criteria, it is by far the quickest and simplest option.

        Of course, if the criteria changed to be 11 digits, it starts to becomes impractical, but that's tomorrows problem :)


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
        "Science is about questioning the status quo. Questioning authority".
        The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.