simulantx has asked for the wisdom of the Perl Monks concerning the following question:

Monks-
This isn't really a Perl question per se, but hopefully easy enough:

I need a RegEx that can match a string of 10 digits or more, but not if all the digits in the string are the same. So for this input, only the 2nd and 4th tokens would match:

2222222222 1234567890 0000000000 48192049281924 99999999999999

Basically something along the lines of "\b\d{10,}\b" but somehow stripping out the strings of identical numbers.

Thanks!

Replies are listed 'Best First'.
Re: RegEx to match unique string of digits
by GrandFather (Saint) on Jun 05, 2009 at 02:55 UTC

    Easier to find all the 10 or more digit numbers, then drop out the special case:

    use strict; use warnings; my $str = '2222222222 1234567890 0000000000 48192049281924 99999999999 +999'; my @matched = grep {! /^(\d)\1{9,}$/} $str =~ /\b(\d{10,})\b/g; print "@matched";

    Prints:

    1234567890 48192049281924

    Update: Ok, here's a regex that does the trick. Let me know which you think more readable and maintainable. ;)

    use strict; use warnings; my $str = '2222222222 1234567890 0000000000 48192049281924 99999999999 +999'; my @matched = $str =~ / [^\d]* (?:\b(?: 0{10,} | 1{10,} | 2{10,} | 3{10,} | 4{10,} | 5{10,} | 6{10,} | 7{10,} | 8{10,} | 9{10,})\b [^\d]* )* (\b\d{10,}\b) (?:[^\d]* \b(?: 0{10,} | 1{10,} | 2{10,} | 3{10,} | 4{10,} | 5{10, +} | 6{10,} | 7{10,} | 8{10,} | 9{10,})\b )* /gx; print "@matched";

    True laziness is hard work
      That regexp isn't going to work. Running it against a string of 10 identical numbers, it will match those 10 numbers.
      my $str = '0000000000'; my @matched = $str =~ / [^\d]* (?:\b(?: 0{10,} | 1{10,} | 2{10,} | 3{10,} | 4{10,} | 5{10,} | 6{10,} | 7{10,} | 8{10,} | 9{10,})\b [^\d]* )* (\b\d{10,}\b) (?:[^\d]* \b(?: 0{10,} | 1{10,} | 2{10,} | 3{10,} | 4{10,} | 5{10, +} | 6{10,} | 7{10,} | 8{10,} | 9{10,})\b )* /gx; say "@matched"; __END__ 0000000000

      The problem with your regexp is is that skipping sequences of 10 identical digits is optional. And that the heart, the (\b\d{10,}\b) part, isn't restrictive. And that Perl will do its utter best to find a match somehow, somewhere.

Re: RegEx to match unique string of digits
by lodin (Hermit) on Jun 05, 2009 at 03:23 UTC

    GrandFather's suggestion of filter out the false matches afterwards is probably sufficient for you, and perhaps also more efficient (but do a Benchmark to be certain if time efficiency is an issue). However, maybe you need this in a larger pattern, and then you can use a negative look-ahead to avoid the mono-digit strings.

    $_ = '2222222222 1234567890 123 0000000000 48192049281924 999999999999 +99'; print "$2\n" while /\b(?!(\d)\1+\b)(\d{10,}\b)/g; __END__ 1234567890 48192049281924

    lodin

      Awesome suggestions everyone. I'll run some Benchmarks for sure and see how it works against my data. THANKS!
        RegEx always seems to drive me nuts when I am trying to work on a project! Once I get the right expression(s) though, it is amazingly powerful :) My last project was for ISAPI URL rewrites.
Re: RegEx to match unique string of digits
by jwkrahn (Abbot) on Jun 05, 2009 at 01:36 UTC
      That only matches strings of 10 identical digits. Quite the opposite of what was asked.