Re: Determining uniqueness in a string.

If you're gonna get this nuts about speed, you may as well fire up Inline C or XS. If you know XS, this solution is actually easier to grok than some of the more esoteric solutions above, as well as faster:

!/usr/bin/perl
use strict;
use warnings;

# fast_uniq.plx -- 10 unique digits in a 10-char string

use Benchmark qw( cmpthese );
use Inline C => <<'END_C';

/* works for ASCII, EBCDIC, UTF-8 */

int offset;
SV* digits_sv;

void prime_variables (SV* input_sv, SV* zero_sv) {
    digits_sv = newSVsv(input_sv);
    char* zero_str = SvPV_nolen(zero_sv);
    offset = *zero_str;
}

int test_uniq_digits () {
    STRLEN digits_len = SvCUR( digits_sv );
    char* string      = SvPV( digits_sv, digits_len );

    char test_buf[10] = { 0,0,0,0,0, 0,0,0,0,0 };

    int i, index;
    for (i = 0; i < 10; i++) {
        index = *string - offset;
        if (index < 0 || index > 9) 
            croak("illegal character: '%c'", *string);
        if (test_buf[index] == 1)
            return 0;
        test_buf[index] = 1;
        string++;
    }   
    return 1;
}

END_C

run_test("0123456789", "UNIQUE:\n");
run_test("1123456789", "NOT_UNIQUE:\n");

sub run_test {
    my ($digits, $message) = @_;
    prime_variables($digits, "0");
    print $message;
    cmpthese( -5, {
        u1 => sub {
                   for ( 0 .. 9 ) { 
                      return 0 if $digits !~ /$_/;
                   }       
                   return 1;
                 },      
        u1000 => sub { test_uniq_digits },
    });
    print "\n\n";
}
[download]

Here's the output:

UNIQUE:
          Rate    u1 u1000
u1     10619/s    --  -98%
u1000 692527/s 6421%    --


NOT_UNIQUE:
          Rate    u1 u1000
u1    414417/s    --  -42%
u1000 712019/s   72%    --
[download]

--
Marvin Humphrey
Rectangular Research ― http://www.rectangular.com

Comment on Re: Determining uniqueness in a string. Select or Download Code

Replies are listed 'Best First'.
Re^2: Determining uniqueness in a string. by BrowserUk (Patriarch) on Oct 04, 2005 at 00:03 UTC
Unless your XS version is quicker than my Inline C algorithm: `__DATA__ __C__ int uniq( char* s ) { int i; int t = 0; for( i=0; i<10; i++ ) { t \|= 1 << ( s[i] - 48 ); } return t == 1023; }` [download] which doesn't seem likely, then the big hash lookup will still be faster, Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal? "Science is about questioning the status quo. Questioning authority". The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.	[reply] [d/l]
Re^3: Determining uniqueness in a string. by creamygoodness (Curate) on Oct 04, 2005 at 00:55 UTC
With the caveat that your algo needs a variable offset or it will fail on EBCDIC machines, I like it better than mine. Your first and your last solutions are the best two on the page, IMHO. WRT the giant hash lookup: "That's nice, honey. Did you remember to feed the dog?" ;) -- Marvin Humphrey Rectangular Research ― http://www.rectangular.com	[reply]
Re^4: Determining uniqueness in a string. by BrowserUk (Patriarch) on Oct 04, 2005 at 03:34 UTC
The big hash takes very little time to load from a pre-generated file of the 10! permutations--it doesn't actually take that long to generate either--and the 350 MB loads easily within my 512 MB ram machine without swapping (if I haven't got much else running at the time). If raw performance is the criteria, it is by far the quickest and simplest option. Of course, if the criteria changed to be 11 digits, it starts to becomes impractical, but that's tomorrows problem :) Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal? "Science is about questioning the status quo. Questioning authority". The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.	[reply]
Re^5: Determining uniqueness in a string. by creamygoodness (Curate) on Oct 04, 2005 at 05:40 UTC
Re^6: Determining uniqueness in a string. by BrowserUk (Patriarch) on Oct 04, 2005 at 13:20 UTC
Re^6: Determining uniqueness in a string. by Perl Mouse (Chaplain) on Oct 04, 2005 at 08:57 UTC