in reply to Find duplicate digits

I would use a hash to count the occurance of each digit in the numbers. Then, if the number has a count of two for any digit, keep it. Otherwise, discard it. For example:

my @keep; # numbers to keep while(chomp( my $line = <NUMBERS> ) ) { my @digits = split '', $line; my %count; ++$count{$_} for @digits; if( grep { $_ == 2 } values %count ) { push @keep, $line; } }

Replies are listed 'Best First'.
Re^2: Find duplicate digits
by radiantmatrix (Parson) on Feb 15, 2006 at 15:54 UTC

    Wow, that's a lot of excessive work; why use the hash, when a regex can count matches?

    my @keep; while ( chomp(my $num = <DATA>) ) { for ( split //, $num ) { my @a = ($num =~ m/$_/g); #count occurances if (@a>1) { push @keep, $num; last; } #keep if more than one } }
    <-radiant.matrix->
    A collection of thoughts and links from the minds of geeks
    The Code that can be seen is not the true Code
    I haven't found a problem yet that can't be solved by a well-placed trebuchet
      :shrug:, seems like roughly the same amount of work to me. Unless by "work" you mean lines of code, in which case I was being rather more verbose than usual. My method could easily be written:

      my @keep; # numbers to keep while(chomp( my $line = <NUMBERS> ) ) { my %count; ++$count{$_} for split '', $line; push @keep, $line if grep { $_ == 2 } values %count }

      I also wouldn't be surprised if the regex method was slower, but I'm too lazy to do a benchmark right now. :) (And for only 10,000 numbers, it probably does not matter much.)

        You piqued my curiosity. It seems that whether we regex or use a hash, they are basically equivalent. I used a dataset that is all numbers 1000..9999, which is the most fair test I could devise.

        Please read the reply below. This is what I get for forgetting strict and warnings in my benchmark code. It turns out the hash approach is much faster than the regex (which, honestly, surprised me).

        $ test.pl Rate regex hash regex 1048218/s -- -0% hash 1048218/s 0% -- $ test.pl Rate regex hash regex 1047449/s -- -1% hash 1059547/s 1% -- $ test.pl Rate hash regex hash 1043950/s -- -1% regex 1054296/s 1% --

        Benchmark code...

        use Time::HiRes; use Benchmark ':all'; sub regex { my @keep; foreach my $num (@data) { chomp; for ( split //, $num ) { my @a = ($num =~ m/$_/g); if (@a == 2) { push @keep, $num; last; } } } } sub hash { my @keep; # numbers to keep foreach my $line (@data) { chomp; my %count; ++$count{$_} for split '', $line; push @keep, $line if grep { $_ == 2 } values %count } } my @data = <DATA>; cmpthese ( 1_000_000, { hash => 'hash()', regex => 'regex()', });
        <-radiant.matrix->
        A collection of thoughts and links from the minds of geeks
        The Code that can be seen is not the true Code
        I haven't found a problem yet that can't be solved by a well-placed trebuchet
Re^2: Find duplicate digits
by Not_a_Number (Prior) on Feb 15, 2006 at 21:14 UTC

    Strangely, nobody yet seems to have taken up the issue of this line in friedo's code:

        while(chomp( my $line = <NUMBERS> ) ) {

    Don't do this!!

    Why? Because if, for example, the last line of the file does not end with a newline, your code will ignore it (in order to undertand why, type "perldoc -f chomp" at your command line).

    For a more detailed discussion of this meme, see thread 303987, in particular the comments from ChemBoy and Abigail-II.

Re^2: Find duplicate digits
by Anonymous Monk on Feb 15, 2006 at 15:15 UTC
    Works like magic, thaaaaaaaaaaaaaaanks you!