in reply to number of unique characters in a string

No one has tried with regular expressions yet, so:
sub f { my $txt = join('', sort(split //,shift)); $txt =~ s/(.)\1+/$1/g; return length($txt); }

Replies are listed 'Best First'.
Re: Re: number of unique characters in a string
by BrowserUk (Patriarch) on Mar 01, 2003 at 05:28 UTC

    You could replace the s/// with tr///s

    sub f { my $txt = join('', sort(split //,shift)); $txt =~ tr/\x00-\xff//s; return length($txt); }

    Examine what is said, not who speaks.
    1) When a distinguished but elderly scientist states that something is possible, he is almost certainly right. When he states that something is impossible, he is very probably wrong.
    2) The only way of discovering the limits of the possible is to venture a little way past them into the impossible
    3) Any sufficiently advanced technology is indistinguishable from magic.
    Arthur C. Clarke.
Re: number of unique characters in a string
by Northpass Kid (Initiate) on Oct 21, 2011 at 19:36 UTC

    This thread is a bit old, but here is my take on the problem using only a regex. I needed to check a string that was entered as a new password for several format restrictions including having at least five different characters. I was adding this to existing code that expected a compiled regex to test the string so I didn't have the option of using additional commands. The matching regex just had to succeed or fail.

    use Data::Dumper; my $unqchar5_regex = qr/^(?{%_=()})(?:(.)(?{$_{$1}++}))+ (??{(scalar(keys %_)<5)?~$1:''})$/x; my $pswd = "abcdef"; print (($pswd =~ $unqchar5_regex)?"Pass\n":"Fail\n"); print Dumper \%_; my $pswd = "abcdbd"; print (($pswd =~ $unqchar5_regex)?"Pass\n":"Fail\n"); print Dumper \%_;

    The value of %_ is just kind of a bonus because rather than just setting the char as a hash element I increment it so you end up with a character count. I use %_ because it is defined globally by default. If you are afraid of collisions with its use then you can give the hash a different name but you'll have to define that variable somewhere in the code if 'use strict vars' is on.

    Here's what is going on:

    qr/^                             # Beginning of string.
        (?{%_=()})                   # Clear the counting hash (zero width op).
        (?:                          # Group the matching of the character with the setting 
                                     #   of the count hash, but don't collect the value.
           (.)                       # Match just one char and collect it.
           (?{$_{$1}++})             # Use the char as the key in the hash and count it (zero width op).
        )+                           # Do the collect and count for as many chars as we have.
        (??{                         # Eval this code and use its val as a pattern (zero width op).
            (scalar(keys %_)<5)      # Perform test for number of unique characters.
                               ?~$1  # If not what we want, fowl the pattern to make the match fail.
                               :''   # Otherwise don't change the pattern so match succeeds.
           })
       $/x;                          # Anchor end of line (x to break up the pattern).
    

    Really any boolean test can be performed on the hash. Just have it evaluate to blank ('') if the regex should succeed, ~$1 if it should fail. The tilde (~) on ~$1 is the bitwise compliment operator, therefor whatever character (byte value) is in $1, ~$1 is guaranteed to NOT match. I needed to do this because my list of valid characters was ALL characters.

    Here is a version with some debugging output:

    my $unqchar5_regex = qr/^(?{%_=()})(?:(.)(?{$_{$1}++}))+ (?{warn Dumper \%_}) (??{warn scalar(keys %_)." $1\n"; (scalar(keys %_)<5)?~$1:'' })$/x;

    If you run this you will see that the single char collection steps through and matches the entire string. Then if the boolean test doesn't add to the pattern, it's done, all matched, success. Otherwise if the test expression adds the value of ~$1 to the pattern, we are out of characters so the match is failing, but the regex engine backs up the string to be sure. Since it can't make a match (because $1 != ~$1) it fails.

    In the end, we ended up not using this code at all because the password cracking tester did it. :P Hopefully this will be of use to someone.