Find duplicate digits

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Find duplicate digits by friedo (Prior) on Feb 15, 2006 at 15:00 UTC
I would use a hash to count the occurance of each digit in the numbers. Then, if the number has a count of two for any digit, keep it. Otherwise, discard it. For example: `my @keep; # numbers to keep while(chomp( my $line = <NUMBERS> ) ) { my @digits = split '', $line; my %count; ++$count{$_} for @digits; if( grep { $_ == 2 } values %count ) { push @keep, $line; } }` [download]	[reply] [d/l]
Re^2: Find duplicate digits by radiantmatrix (Parson) on Feb 15, 2006 at 15:54 UTC
Wow, that's a lot of excessive work; why use the hash, when a regex can count matches? `my @keep; while ( chomp(my $num = <DATA>) ) { for ( split //, $num ) { my @a = ($num =~ m/$_/g); #count occurances if (@a>1) { push @keep, $num; last; } #keep if more than one } }` [download] <-radiant.matrix-> A collection of thoughts and links from the minds of geeks The Code that can be seen is not the true Code I haven't found a problem yet that can't be solved by a well-placed trebuchet	[reply] [d/l]
Re^3: Find duplicate digits by friedo (Prior) on Feb 15, 2006 at 16:25 UTC
`:shrug:`, seems like roughly the same amount of work to me. Unless by "work" you mean lines of code, in which case I was being rather more verbose than usual. My method could easily be written: `my @keep; # numbers to keep while(chomp( my $line = <NUMBERS> ) ) { my %count; ++$count{$_} for split '', $line; push @keep, $line if grep { $_ == 2 } values %count }` [download] I also wouldn't be surprised if the regex method was slower, but I'm too lazy to do a benchmark right now. :) (And for only 10,000 numbers, it probably does not matter much.)	[reply] [d/l] [select]
Re^4: Find duplicate digits by radiantmatrix (Parson) on Feb 15, 2006 at 18:46 UTC
Re^5: Find duplicate digits by GrandFather (Saint) on Feb 16, 2006 at 03:51 UTC
Re^2: Find duplicate digits by Not_a_Number (Prior) on Feb 15, 2006 at 21:14 UTC
Strangely, nobody yet seems to have taken up the issue of this line in friedo's code: `while(chomp( my $line = <NUMBERS> ) ) {` Don't do this!! Why? Because if, for example, the last line of the file does not end with a newline, your code will ignore it (in order to undertand why, type "perldoc -f chomp" at your command line). For a more detailed discussion of this meme, see thread 303987, in particular the comments from ChemBoy and Abigail-II.	[reply] [d/l]
Re^2: Find duplicate digits by Anonymous Monk on Feb 15, 2006 at 15:15 UTC
Works like magic, thaaaaaaaaaaaaaaanks you!	[reply]
Re: Find duplicate digits by japhy (Canon) on Feb 15, 2006 at 15:08 UTC
So there are two rules: No digit can appear more than twice, and At least one digit must be duplicated. So break it down into its cases: `m{ ^ (?= \d\d\d\d $ ) # ensure it's only 4 digits long (?: # first digit is the duplicated one (\d) \1 (?!\1) \d (?!\1) \d \| (\d) (?!\2) \d \2 (?!\2) \d \| (\d) (?!\3) \d (?!\3) \d \3 # second digit is the duplicated one \| (\d) (?!\4) (\d) \5 (?!\4\|\5) \d \| (\d) (?!\6) (\d) (?!\6\|\7) \d \7 # third digit is the duplicated one \| (\d) (\d) (?!\8\|\9) (\d) \10 ) }x;` [download] Now, that's just atrocious. It can be made a bit more efficient by grouping things together smarter: `m{ ^ (?= \d\d\d\d $ ) # ensure it's only 4 digits long (?: # first digit is the duplicated one (\d) (?: \1 (?!\1) \d (?!\1) \d \| (?!\1) \d \1 (?!\1) \d \| (?!\1) \d (?!\1) \d \1 ) # second digit is the duplicated one \| (\d) (?!\2) (\d) (?: \3 (?!\2\|\3) \d \| (?!\2\|\3) \d \3 ) # third digit is the duplicated one \| (\d) (\d) (?!\4\|\5) (\d) \6 ) }x;` [download] But we're still stuck with disgusting regexes. So... why use a regex? friedo's got a simple non-regex solution for you. The "pattern" you need to match isn't a pretty one. Update: you could also use a far simpler regex that makes two passes at the string like so: `m{ ^ (?= \d* (\d) \d* \1 ) (?! \d* \1 \d* \1 \d* \1 ) }x;` [download] Jeff `japhy` Pinyan, P.L., P.M., P.O.D, X.S.: Perl, regex, and `perl` hacker How can we ever be the sold short or the cheated, we who for every service have long ago been overpaid? ~~ Meister Eckhart	[reply] [d/l] [select]
Re^2: Find duplicate digits by Anonymous Monk on Feb 15, 2006 at 15:21 UTC
Thanks for your enlightenment :) I didn't know how to approach the problem...regex was the only thing that came to mind. But, yes, I like friedo's simple solution.	[reply]
Re^2: Find duplicate digits by Anonymous Monk on Feb 16, 2006 at 03:13 UTC
Hi japhy, I'm trying to understand your updated regex: `m{ ^ (?= \d* (\d) \d* \1 ) (?! \d* \1 \d* \1 \d* \1 ) }x;` [download] Could you explain what's going there? Tia :)	[reply] [d/l]
Re^3: Find duplicate digits by japhy (Canon) on Feb 16, 2006 at 07:43 UTC
The regex is anchored to the beginning of the string. The first assertion is a positive look-ahead that says "see if we can find zero or more digits, followed by a particular digit, followed by zero or more digits, and then that particular digit again". That satisfies the "at least one digit must be duplicated" rule. The second assertion is a negative look-ahead that says "make sure we can't match zero or more digits, that particular digit, zero or more digits, that particular digit, and then zero or more digits and that particular digit a THIRD time". This satisfies the "the digit can't appear more than three times" rule. Jeff `japhy` Pinyan, P.L., P.M., P.O.D, X.S.: Perl, regex, and `perl` hacker How can we ever be the sold short or the cheated, we who for every service have long ago been overpaid? ~~ Meister Eckhart	[reply]
Re^4: Find duplicate digits by Anonymous Monk on Feb 16, 2006 at 09:35 UTC
Re: Find duplicate digits by BrowserUk (Patriarch) on Feb 15, 2006 at 15:18 UTC
Not very satisfying as a solution, but it works. `#! perl -slw use strict; m[ (?:^[^0]0[^0]0[^0]$)\| (?:^[^1]1[^1]1[^1]$)\| (?:^[^2]2[^2]2[^2]$)\| (?:^[^3]3[^3]3[^3]$)\| (?:^[^4]4[^4]4[^4]$)\| (?:^[^5]5[^5]5[^5]$)\| (?:^[^6]6[^6]6[^6]$)\| (?:^[^7]7[^7]7[^7]$)\| (?:^[^8]8[^8]8[^8]$)\| (?:^[^9]9[^9]9[^9]$) ]x and print for '0000' .. '9999';` [download] Update: ... but only for 4 digit numbers. By inverting the logic and only printing numbers that don't contain 3 repeated digits, it should work for numbers of any length. `#! perl -slw use strict; m[ (?:[^0]0[^0]0[^0]0[^0])\| (?:[^1]1[^1]1[^1]1[^1])\| (?:[^2]2[^2]2[^2]2[^2])\| (?:[^3]3[^3]3[^3]3[^3])\| (?:[^4]4[^4]4[^4]4[^4])\| (?:[^5]5[^5]5[^5]5[^5])\| (?:[^6]6[^6]6[^6]6[^6])\| (?:[^7]7[^7]7[^7]7[^7])\| (?:[^8]8[^8]8[^8]8[^8])\| (?:[^9]9[^9]9[^9]9[^9]) ]x or m[(?=(\d).*\1)] and print for '000000' .. '999999';` [download] Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal? "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l] [select]
Re^2: Find duplicate digits by Anonymous Monk on Feb 15, 2006 at 15:35 UTC
Thanks, BrowserUk :) How do I modify your code to add a new line to each printed number?	[reply]
Re^3: Find duplicate digits by BrowserUk (Patriarch) on Feb 15, 2006 at 15:39 UTC
If you included `-l` on your shebang line as I have, it would add the newlines for you, but since many people do not like simple :), you can do this: `#! perl -sw use strict; m[ (?:^[^0]0[^0]0[^0]$)\| (?:^[^1]1[^1]1[^1]$)\| (?:^[^2]2[^2]2[^2]$)\| (?:^[^3]3[^3]3[^3]$)\| (?:^[^4]4[^4]4[^4]$)\| (?:^[^5]5[^5]5[^5]$)\| (?:^[^6]6[^6]6[^6]$)\| (?:^[^7]7[^7]7[^7]$)\| (?:^[^8]8[^8]8[^8]$)\| (?:^[^9]9[^9]9[^9]$) ]x and print "$_\n" for '0000' .. '9999';` [download] Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal? "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l] [select]
Re^2: Find duplicate digits by Anonymous Monk on Feb 15, 2006 at 15:40 UTC
Please ignore my earlier post below. I figured that out. Sorry about that...	[reply]
Re: Find duplicate digits by inman (Curate) on Feb 15, 2006 at 15:35 UTC
Canonicalise the data and look for multiples. The grep tests for numbers where there are a number of multiples of which at least one is a pair. This solution works with numbers of any length and can be made to work with triples, quads etc. just by changing the grep test. The OP didn't make it clear whether a number such as 0101 that has two pairs should be counted. You can always test the number of members of the list returned by the grep. `while (<DATA>){ print "$_" if grep {length $_ == 2} join('', sort split //, $_) =~ + /((\d)\2+)/g; } __DATA__ 0000 0001 0002 0003 0004 0005 0006 0007 0008 0009 0010 0011 0012 0013 0014 01511 451411` [download] gives `0011 0012 0013 0014 451411` [download]	[reply] [d/l] [select]
Re^2: Find duplicate digits by Anonymous Monk on Feb 15, 2006 at 15:52 UTC
Thanks, inman! Yes, I would like 1010 to be counted. I'm finding it hard to understand your grep. Where is the list that it's supposed to work on?	[reply]
Re^3: Find duplicate digits by holli (Abbot) on Feb 15, 2006 at 17:29 UTC
The code is equivalent to `while (<DATA>) { my $a = join('', sort split //, $_); my @a = $a =~ /((\d)\2+)/g; print "$_" if grep {length $_ == 2} @a; }` [download] holli, /regexed monk/	[reply] [d/l]
Re^4: Find duplicate digits by Anonymous Monk on Feb 16, 2006 at 04:25 UTC
Re: Find duplicate digits by Roy Johnson (Monsignor) on Feb 15, 2006 at 16:47 UTC
It looks almost paradoxical, but this does the trick: `#! perl use strict; use warnings; while (<DATA>) { print if /(\d)(?=.\1)/g and not /($1)(?=.\1)/g; } __DATA__ 0000 (Don't pick) 0001 (Don't pick) 0012 (Pick) 0011 (Pick) 1001 (Pick) 1009 (Pick) 1234 (Don't pick) 9879 (Pick)` [download] Because I use the /g, the second match continues after the first one, so its capture group will actually be the 2nd occurrence of the digit. Then the backreference would be the third (which would mean we don't want it). It would not work for situations where some digit appears three times but another appears only twice (so it would be a "pick"). But since your examples are only four digits long, that doesn't come up here. Caution: Contents may have been coded under pressure.	[reply] [d/l]
Re: Find duplicate digits by salva (Canon) on Feb 15, 2006 at 15:49 UTC
`@m = grep {/(\d).\1/ and !/(\d).\1.*\1/} '0000' .. '9999'`	[reply] [d/l]
Re: Find duplicate digits by GrandFather (Saint) on Feb 16, 2006 at 03:33 UTC
Here's a fairly tidy solution (well, I like it): `use strict; use warnings; my @nums = (1000..9999); my @picks = grep {my %d; map {$d{$_}++} split ''; grep {$_ == 2} value +s %d} @nums; print join "\n", @picks;` [download] Read more... Prints: (20 kB) DWIM is Perl's answer to Gödel	[reply] [d/l] [select]
Re: Find duplicate digits by LanX (Saint) on Apr 07, 2010 at 14:41 UTC
straightforward, accept one repetition but not two. `print grep { /(\d).\1/ && ! /(\d).\1.*\1/ } map {sprintf "%04d",$_} +(0..9999)` [download] Tried to do it in one regex but I couldn't find out how to negate the match \1, `[^\1]` obviously doesnt work and all negative look aheads get confused by the end of string... Cheers Rolf UPDATE: sorry nothing new :( ...Re: Find duplicate digits	[reply] [d/l] [select]
Re^2: Find duplicate digits by LanX (Saint) on Apr 08, 2010 at 01:00 UTC
another variation: `print grep { /(\d).*\1/ and eval "2 == tr/$1//" } map {sprintf "%04d +",$_} (0..9999)` [download] Cheers Rolf	[reply] [d/l]
Re: Find duplicate digits by mroman (Initiate) on Apr 07, 2010 at 10:50 UTC
Hello Monks, This is my first post, and I was delighted to find this thread, as it almost exactly addresses what I want to do, with the exception that I would like to narrow the numbers down even further in comparison to the OP's spec. Thus, I want to take 0000 to 9999 and select only those that have one pair of duplicate digits. For example: 0000 - no 0011 - no 0012 - yes 0120 - yes I am a noob, and this has turned out to be a bit beyond my skill level at this time . . . I was going the grep + regex on Bash route, however, upon seeing the Perl solutions in this thread, I believe I will only use bash to admin my system on the command line from now on :)	[reply]
Re^2: Find duplicate digits by BrowserUk (Patriarch) on Apr 07, 2010 at 11:14 UTC
`perl -le"%h=(), undef @h{split'',$_}, keys %h == 3 and print for '0000 +'..'9999'"` [download]	[reply] [d/l]
Re^3: Find duplicate digits by mroman (Initiate) on Apr 07, 2010 at 13:09 UTC
BrowserUk, Thank you for this. This is the message I got: `syntax error at ns.pl line 1, near "-le" Execution of ns.pl aborted due to compilation errors.` [download] ?	[reply] [d/l]
Re^4: Find duplicate digits by BrowserUk (Patriarch) on Apr 07, 2010 at 13:29 UTC
Re^5: Find duplicate digits by mroman (Initiate) on Apr 07, 2010 at 14:15 UTC
Re^4: Find duplicate digits by Anonymous Monk on Apr 07, 2010 at 13:14 UTC
Re^5: Find duplicate digits by mroman (Initiate) on Apr 07, 2010 at 14:07 UTC
Re^2: Find duplicate digits (non-filter) by tye (Sage) on Apr 07, 2010 at 14:41 UTC
This version directly generates the list of desired patterns rather than filtering through all 4-digit patterns, just to demonstrate another approach. It also works for any length of digit string. `#/usr/bin/perl -w use strict; use Algorithm::Loops qw< NestedLoops NextPermute >; my( $uniq )= ( @ARGV, 3 ); # 3 unique digits plus one duplicate digit my $digs= NestedLoops( [ [0..9], ( sub { [1+$_..9] } ) x ($uniq-1) ], ); my @digs; while( @digs= $digs->() ) { do { for my $dup ( 0 .. $#digs ) { for my $after ( $dup .. $#digs ) { my @result= @digs; splice @result, 1+$after, 0, $digs[$dup]; print @result, $/; } } } while( NextPermute( @digs ) ); }` [download] - tye	[reply] [d/l]
Re^3: Find duplicate digits (non-filter) by mroman (Initiate) on Apr 08, 2010 at 09:13 UTC
Tye, Thank you, I will look into this one. Cheers, M.ROMAN	[reply]
Re^2: Find duplicate digits by Ratazong (Monsignor) on Apr 07, 2010 at 11:16 UTC
You can try the following algorithm: pass through your input one digit per time use a hash to count how often each digit occurs once you are finished, pass through the hash (again) to see how many digits are ocuring two times print yes/no Once you have done this, think how to adapt the algorithm to handle the special case 1111.... Have Fun! Rata PS.: it is good that you looked for an old thread! However you'll get more answers if you'd opened a new thread and just had provided a link to the old one! well, next time ;-)	[reply]
Re^3: Find duplicate digits by mroman (Initiate) on Apr 07, 2010 at 13:18 UTC
Ratazong, Thank you for your reply. (vigorously flapping noob wings) Argh! Can't fly that high as of yet . . . I will link to a thread instead next time however :) Cheers, M.Roman	[reply]
Re: Find duplicate digits by Anonymous Monk on Feb 16, 2006 at 10:11 UTC
Is this a problem from project Euler (mathschallenge.net)?	[reply]
Re^2: Find duplicate digits by Anonymous Monk on Feb 16, 2006 at 11:46 UTC
No. I wanted to find out how many of those numbers there are. A friend of mine is into this lottery sort of game, where you bet on one of the 10 000 numbers. I wanted to find out how his chances of winning are if he bets on all those numbers that have a duplicate. The set is almost half of all the possibilities and he can't possibly buy that many. Programmatically, I'm interested to learn how to solve this sort of things.	[reply]