Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello monks,

I've 10,000 numbers starting from 0000 to 9999:

0000 0001 0002 0003 0004 0005 0006 0007 0008 0009 0010 0011 0012 0013 0014 0015 ... 9999
I would like to pick up the numbers that have one of its digits repeated only twice. So Instance,

0000 (Don't pick)
0001 (Don't pick)
0012 (Pick)
0011 (Pick)
1001 (Pick)
1009 (Pick)
1234 (Don't pick)
9879 (Pick)

I have the numbers in a file, one line for each number. I can set up my code to read them one at a time but I've no idea how to do the regex.

Any help will be greatly appreicated. Thank you :)

Replies are listed 'Best First'.
Re: Find duplicate digits
by friedo (Prior) on Feb 15, 2006 at 15:00 UTC
    I would use a hash to count the occurance of each digit in the numbers. Then, if the number has a count of two for any digit, keep it. Otherwise, discard it. For example:

    my @keep; # numbers to keep while(chomp( my $line = <NUMBERS> ) ) { my @digits = split '', $line; my %count; ++$count{$_} for @digits; if( grep { $_ == 2 } values %count ) { push @keep, $line; } }

      Wow, that's a lot of excessive work; why use the hash, when a regex can count matches?

      my @keep; while ( chomp(my $num = <DATA>) ) { for ( split //, $num ) { my @a = ($num =~ m/$_/g); #count occurances if (@a>1) { push @keep, $num; last; } #keep if more than one } }
      <-radiant.matrix->
      A collection of thoughts and links from the minds of geeks
      The Code that can be seen is not the true Code
      I haven't found a problem yet that can't be solved by a well-placed trebuchet
        :shrug:, seems like roughly the same amount of work to me. Unless by "work" you mean lines of code, in which case I was being rather more verbose than usual. My method could easily be written:

        my @keep; # numbers to keep while(chomp( my $line = <NUMBERS> ) ) { my %count; ++$count{$_} for split '', $line; push @keep, $line if grep { $_ == 2 } values %count }

        I also wouldn't be surprised if the regex method was slower, but I'm too lazy to do a benchmark right now. :) (And for only 10,000 numbers, it probably does not matter much.)

      Strangely, nobody yet seems to have taken up the issue of this line in friedo's code:

          while(chomp( my $line = <NUMBERS> ) ) {

      Don't do this!!

      Why? Because if, for example, the last line of the file does not end with a newline, your code will ignore it (in order to undertand why, type "perldoc -f chomp" at your command line).

      For a more detailed discussion of this meme, see thread 303987, in particular the comments from ChemBoy and Abigail-II.

      Works like magic, thaaaaaaaaaaaaaaanks you!
Re: Find duplicate digits
by japhy (Canon) on Feb 15, 2006 at 15:08 UTC
    So there are two rules:
    1. No digit can appear more than twice, and
    2. At least one digit must be duplicated.
    So break it down into its cases:
    m{ ^ (?= \d\d\d\d $ ) # ensure it's only 4 digits long (?: # first digit is the duplicated one (\d) \1 (?!\1) \d (?!\1) \d | (\d) (?!\2) \d \2 (?!\2) \d | (\d) (?!\3) \d (?!\3) \d \3 # second digit is the duplicated one | (\d) (?!\4) (\d) \5 (?!\4|\5) \d | (\d) (?!\6) (\d) (?!\6|\7) \d \7 # third digit is the duplicated one | (\d) (\d) (?!\8|\9) (\d) \10 ) }x;
    Now, that's just atrocious. It can be made a bit more efficient by grouping things together smarter:
    m{ ^ (?= \d\d\d\d $ ) # ensure it's only 4 digits long (?: # first digit is the duplicated one (\d) (?: \1 (?!\1) \d (?!\1) \d | (?!\1) \d \1 (?!\1) \d | (?!\1) \d (?!\1) \d \1 ) # second digit is the duplicated one | (\d) (?!\2) (\d) (?: \3 (?!\2|\3) \d | (?!\2|\3) \d \3 ) # third digit is the duplicated one | (\d) (\d) (?!\4|\5) (\d) \6 ) }x;
    But we're still stuck with disgusting regexes. So... why use a regex? friedo's got a simple non-regex solution for you. The "pattern" you need to match isn't a pretty one.

    Update: you could also use a far simpler regex that makes two passes at the string like so:

    m{ ^ (?= \d* (\d) \d* \1 ) (?! \d* \1 \d* \1 \d* \1 ) }x;

    Jeff japhy Pinyan, P.L., P.M., P.O.D, X.S.: Perl, regex, and perl hacker
    How can we ever be the sold short or the cheated, we who for every service have long ago been overpaid? ~~ Meister Eckhart
      Thanks for your enlightenment :) I didn't know how to approach the problem...regex was the only thing that came to mind. But, yes, I like friedo's simple solution.
      Hi japhy,

      I'm trying to understand your updated regex:

      m{ ^ (?= \d* (\d) \d* \1 ) (?! \d* \1 \d* \1 \d* \1 ) }x;
      Could you explain what's going there? Tia :)
        The regex is anchored to the beginning of the string. The first assertion is a positive look-ahead that says "see if we can find zero or more digits, followed by a particular digit, followed by zero or more digits, and then that particular digit again". That satisfies the "at least one digit must be duplicated" rule. The second assertion is a negative look-ahead that says "make sure we can't match zero or more digits, that particular digit, zero or more digits, that particular digit, and then zero or more digits and that particular digit a THIRD time". This satisfies the "the digit can't appear more than three times" rule.

        Jeff japhy Pinyan, P.L., P.M., P.O.D, X.S.: Perl, regex, and perl hacker
        How can we ever be the sold short or the cheated, we who for every service have long ago been overpaid? ~~ Meister Eckhart
Re: Find duplicate digits
by BrowserUk (Patriarch) on Feb 15, 2006 at 15:18 UTC

    Not very satisfying as a solution, but it works.

    #! perl -slw use strict; m[ (?:^[^0]*0[^0]*0[^0]*$)| (?:^[^1]*1[^1]*1[^1]*$)| (?:^[^2]*2[^2]*2[^2]*$)| (?:^[^3]*3[^3]*3[^3]*$)| (?:^[^4]*4[^4]*4[^4]*$)| (?:^[^5]*5[^5]*5[^5]*$)| (?:^[^6]*6[^6]*6[^6]*$)| (?:^[^7]*7[^7]*7[^7]*$)| (?:^[^8]*8[^8]*8[^8]*$)| (?:^[^9]*9[^9]*9[^9]*$) ]x and print for '0000' .. '9999';

    Update: ... but only for 4 digit numbers. By inverting the logic and only printing numbers that don't contain 3 repeated digits, it should work for numbers of any length.

    #! perl -slw use strict; m[ (?:[^0]*0[^0]*0[^0]*0[^0]*)| (?:[^1]*1[^1]*1[^1]*1[^1]*)| (?:[^2]*2[^2]*2[^2]*2[^2]*)| (?:[^3]*3[^3]*3[^3]*3[^3]*)| (?:[^4]*4[^4]*4[^4]*4[^4]*)| (?:[^5]*5[^5]*5[^5]*5[^5]*)| (?:[^6]*6[^6]*6[^6]*6[^6]*)| (?:[^7]*7[^7]*7[^7]*7[^7]*)| (?:[^8]*8[^8]*8[^8]*8[^8]*)| (?:[^9]*9[^9]*9[^9]*9[^9]*) ]x or m[(?=(\d).*\1)] and print for '000000' .. '999999';

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      Thanks, BrowserUk :)

      How do I modify your code to add a new line to each printed number?

        If you included -l on your shebang line as I have, it would add the newlines for you, but since many people do not like simple :), you can do this:

        #! perl -sw use strict; m[ (?:^[^0]*0[^0]*0[^0]*$)| (?:^[^1]*1[^1]*1[^1]*$)| (?:^[^2]*2[^2]*2[^2]*$)| (?:^[^3]*3[^3]*3[^3]*$)| (?:^[^4]*4[^4]*4[^4]*$)| (?:^[^5]*5[^5]*5[^5]*$)| (?:^[^6]*6[^6]*6[^6]*$)| (?:^[^7]*7[^7]*7[^7]*$)| (?:^[^8]*8[^8]*8[^8]*$)| (?:^[^9]*9[^9]*9[^9]*$) ]x and print "$_\n" for '0000' .. '9999';

        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
      Please ignore my earlier post below. I figured that out. Sorry about that...
Re: Find duplicate digits
by inman (Curate) on Feb 15, 2006 at 15:35 UTC
    Canonicalise the data and look for multiples. The grep tests for numbers where there are a number of multiples of which at least one is a pair. This solution works with numbers of any length and can be made to work with triples, quads etc. just by changing the grep test.

    The OP didn't make it clear whether a number such as 0101 that has two pairs should be counted. You can always test the number of members of the list returned by the grep.

    while (<DATA>){ print "$_" if grep {length $_ == 2} join('', sort split //, $_) =~ + /((\d)\2+)/g; } __DATA__ 0000 0001 0002 0003 0004 0005 0006 0007 0008 0009 0010 0011 0012 0013 0014 01511 451411
    gives
    0011 0012 0013 0014 451411
      Thanks, inman!

      Yes, I would like 1010 to be counted.

      I'm finding it hard to understand your grep. Where is the list that it's supposed to work on?

        The code is equivalent to
        while (<DATA>) { my $a = join('', sort split //, $_); my @a = $a =~ /((\d)\2+)/g; print "$_" if grep {length $_ == 2} @a; }


        holli, /regexed monk/
Re: Find duplicate digits
by Roy Johnson (Monsignor) on Feb 15, 2006 at 16:47 UTC
    It looks almost paradoxical, but this does the trick:
    #! perl use strict; use warnings; while (<DATA>) { print if /(\d)(?=.*\1)/g and not /($1)(?=.*\1)/g; } __DATA__ 0000 (Don't pick) 0001 (Don't pick) 0012 (Pick) 0011 (Pick) 1001 (Pick) 1009 (Pick) 1234 (Don't pick) 9879 (Pick)
    Because I use the /g, the second match continues after the first one, so its capture group will actually be the 2nd occurrence of the digit. Then the backreference would be the third (which would mean we don't want it).

    It would not work for situations where some digit appears three times but another appears only twice (so it would be a "pick"). But since your examples are only four digits long, that doesn't come up here.


    Caution: Contents may have been coded under pressure.
Re: Find duplicate digits
by salva (Canon) on Feb 15, 2006 at 15:49 UTC
    @m = grep {/(\d).*\1/ and !/(\d).*\1.*\1/} '0000' .. '9999'
Re: Find duplicate digits
by GrandFather (Saint) on Feb 16, 2006 at 03:33 UTC

    Here's a fairly tidy solution (well, I like it):

    use strict; use warnings; my @nums = (1000..9999); my @picks = grep {my %d; map {$d{$_}++} split ''; grep {$_ == 2} value +s %d} @nums; print join "\n", @picks;

    DWIM is Perl's answer to Gödel
Re: Find duplicate digits
by LanX (Saint) on Apr 07, 2010 at 14:41 UTC
    straightforward, accept one repetition but not two.

    print grep { /(\d).*\1/ && ! /(\d).*\1.*\1/ } map {sprintf "%04d",$_} +(0..9999)

    Tried to do it in one regex but I couldn't find out how to negate the match \1, [^\1] obviously doesnt work and all negative look aheads get confused by the end of string...

    Cheers Rolf

    UPDATE: sorry nothing new :( ...Re: Find duplicate digits

      another variation:

      print grep { /(\d).*\1/ and eval "2 == tr/$1//" } map {sprintf "%04d +",$_} (0..9999)

      Cheers Rolf

Re: Find duplicate digits
by mroman (Initiate) on Apr 07, 2010 at 10:50 UTC
    Hello Monks,

    This is my first post, and I was delighted to find this thread, as it almost exactly addresses what I want to do, with the exception that I would like to narrow the numbers down even further in comparison to the OP's spec.

    Thus, I want to take 0000 to 9999 and select only those that have one pair of duplicate digits.

    For example:

    • 0000 - no
    • 0011 - no
    • 0012 - yes
    • 0120 - yes

    I am a noob, and this has turned out to be a bit beyond my skill level at this time . . . I was going the grep + regex on Bash route, however, upon seeing the Perl solutions in this thread, I believe I will only use bash to admin my system on the command line from now on :)

      perl -le"%h=(), undef @h{split'',$_}, keys %h == 3 and print for '0000 +'..'9999'"
        BrowserUk,

        Thank you for this.

        This is the message I got:

        syntax error at ns.pl line 1, near "-le" Execution of ns.pl aborted due to compilation errors.
        ?

      This version directly generates the list of desired patterns rather than filtering through all 4-digit patterns, just to demonstrate another approach. It also works for any length of digit string.

      #/usr/bin/perl -w use strict; use Algorithm::Loops qw< NestedLoops NextPermute >; my( $uniq )= ( @ARGV, 3 ); # 3 unique digits plus one duplicate digit my $digs= NestedLoops( [ [0..9], ( sub { [1+$_..9] } ) x ($uniq-1) ], ); my @digs; while( @digs= $digs->() ) { do { for my $dup ( 0 .. $#digs ) { for my $after ( $dup .. $#digs ) { my @result= @digs; splice @result, 1+$after, 0, $digs[$dup]; print @result, $/; } } } while( NextPermute( @digs ) ); }

      - tye        

        Tye,

        Thank you, I will look into this one.

        Cheers,

        M.ROMAN

      You can try the following algorithm:

      1. pass through your input one digit per time
      2. use a hash to count how often each digit occurs
      3. once you are finished, pass through the hash (again) to see how many digits are ocuring two times
      4. print yes/no

      Once you have done this, think how to adapt the algorithm to handle the special case 1111....

      Have Fun! Rata

      PS.: it is good that you looked for an old thread! However you'll get more answers if you'd opened a new thread and just had provided a link to the old one! well, next time ;-)

        Ratazong,

        Thank you for your reply.

        (vigorously flapping noob wings)

        Argh! Can't fly that high as of yet . . .

        I will link to a thread instead next time however :)

        Cheers,

        M.Roman
Re: Find duplicate digits
by Anonymous Monk on Feb 16, 2006 at 10:11 UTC
    Is this a problem from project Euler (mathschallenge.net)?
      No. I wanted to find out how many of those numbers there are. A friend of mine is into this lottery sort of game, where you bet on one of the 10 000 numbers. I wanted to find out how his chances of winning are if he bets on all those numbers that have a duplicate. The set is almost half of all the possibilities and he can't possibly buy that many.

      Programmatically, I'm interested to learn how to solve this sort of things.