Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I know how to program Perl a bit, but dont have enough regex voodoo knowhow for this particular snippet..Any pointers much appreciated..

What I need to do is provide a single regex that will return a false when it encounters any one of a list of numbers..This is actually for a config file (so I cant use conventional boolean logic, worse luck).. using the Perl cookbook, and some prayer, I came up with the following regex,

/^(?!.*7000|7777|7778|3886200|2200|8488|3406|9100|29389988|7688|5000|2 +0|3408|3404|7648).*[0-9]+/

What this perfectly awful piece of code is supposed to do is, when input a number, if the number is in the list above (ie: 7000, 7777, 7778 etc), it should return a FALSE.. ie, the boolean match should fail..if any other number is input, it should pass..

The problem is, although the regex should allow the input of a number like "70000" (note, there is an extra 0), it actually throws it out (I suspect this is because it matches the 7000)..

So, I suppose the general question is, the Perl cookbook describes how to incorporate boolean logic into a regex (recipe 6.17).. but it stops short of saying how to use a list of values instead of just one.. and simple | separated OR-ing doesnt seem to work.. so, any pointers on how to use a list of values much appreciated..

Muchos gracias

PS: I know its ugly ugly code, I see several .* matches *oh, the horror*. but I honestly didnt see any other way to catch everything I wanted to..

Replies are listed 'Best First'.
Re: regex that simulates boolean logic
by Zaxo (Archbishop) on Nov 27, 2002 at 10:14 UTC

    A regex is the wrong tool for this job. Use a hash with those numbers as keys.

    use vars '%boolhash'; @boolhash{ qw(7000 7777 7778 3886200 2200 8488 3406 9100 29389988 7688 5000 20 3408 3404 7648) } = (); sub mytest { not exists $boolhash{ $_[0] }; }
    That will be faster and tidier, and you can add values to %boolhash on the fly.

    After Compline,
    Zaxo

Re: regex that simulates boolean logic
by BrowserUk (Patriarch) on Nov 27, 2002 at 10:10 UTC

    It's a little hard to see quite how you can use a regex any place that you can't use "conventional boolean logic".

    However, one way to have a single regex that can be used to reject or fail a given list of numbers would be to make the regex match the specified list and then negate the test by either using !~ or simply ...not m//....

    You can avoid it matching smaller chunks of larger numbers (ie. finding 7000 as a part of 70000) by anchoring the expression at both ends. Which anchor is appropriate depends on how and where it would be used. This might work for you.

    @tests =qw/1 2 3 7 700 70000 7000 7777 7778 3886200 2200 8488 3406 91 +00 29389988 7688 5000 20 3408 3404 7648/ $fail = qr/\b(?:7000|7777|7778|3886200|2200|8488|3406|9100|29389988|76 +88|5000|20|3408|3404|7648)\b/ perl> for (@tests) { print /$fail/ ? "$_ failed\n" : "$_ passed\n" } 1 passed 2 passed 3 passed 7 passed 700 passed 70000 passed 7000 failed 7777 failed 7778 failed 3886200 failed 2200 failed 8488 failed 3406 failed 9100 failed 29389988 failed 7688 failed 5000 failed 20 failed 3408 failed 3404 failed 7648 failed

    Okay you lot, get your wings on the left, halos on the right. It's one size fits all, and "No!", you can't have a different color.
    Pick up your cloud down the end and "Yes" if you get allocated a grey one they are a bit damp under foot, but someone has to get them.
    Get used to the wings fast cos its an 8 hour day...unless the Govenor calls for a cyclone or hurricane, in which case 16 hour shifts are mandatory.
    Just be grateful that you arrived just as the tornado season finished. Them buggers are real work.

Re: regex that simulates boolean logic
by EvdB (Deacon) on Nov 27, 2002 at 10:17 UTC
    If it is ugly code then generally it is is not good code, and something like a regex can easily fool you into thinking it is working right up until it breaks.

    If these values are going into a config file why not list them like:

    7000,7777,7778,3886200, ...

    and then place them in a hash, say %not_allowed. You could then check your input using:

    unless ( exists $not_allowed{ $input } ) { # do your stuff }

    This may require more lines of code but would be much easier to check and understand, and also easier to edit by hand - working with a list not a regex.

Re: regex that simulates boolean logic
by helgi (Hermit) on Nov 27, 2002 at 10:49 UTC
    If a question is about Perl and the answer is not either "use a hash" or "use a module that uses a hash", the answer is probably wrong.

    Populate a hash with your numbers and then test for existence, something like untested:

    my @numbers = qw(123 345 456 567 890); my %hash; $hash{$_}++ for @numbers; for (1,56,123,567,8888) { if ($hash{$_}) {print "Found $_\n" } else { print "Not found $_\n"; } }

    --
    Regards,
    Helgi Briem
    helgi AT decode DOT is

Re: regex that simulates boolean logic
by Anonymous Monk on Nov 27, 2002 at 11:51 UTC

    thanks for the replies, folks.. unfortunately, this is for a Radius server config file, and it only accepts a regex..

    its quite possible I'm trying to make a regex do something that it shouldnt be doing.. *sigh* but at any rate, using a hash or basically any Perl code besides a regex causes a radius server barf..

      In order to be able to accruately give you a regex that will help it is important to have the context right. So would it be possible to have a number of test case strings off valid/invalid data.

      Meanwhile :-

      /^\D*(?!7000|7777|7778|3886200|2200|8488|3406|9100|29389988|7688|5000| +2 +0|3408|3404|7648)\D*[0-9]+/
      Hope it helps
      UnderMine

      Assuming that the regex engine accepts Perl5 regex extensions, this may do what you need it to.

      /^(?!7000$|7777$|7778$|3886200$|2200$|8488$|3406$|9100$|29389988$|7688 +$|5000$|20$|3408$|3404$|7648$)\d+$/
      It seems to work for all the testcases in my post above plus several more I threw in.

      Okay you lot, get your wings on the left, halos on the right. It's one size fits all, and "No!", you can't have a different color.
      Pick up your cloud down the end and "Yes" if you get allocated a grey one they are a bit damp under foot, but someone has to get them.
      Get used to the wings fast cos its an 8 hour day...unless the Govenor calls for a cyclone or hurricane, in which case 16 hour shifts are mandatory.
      Just be grateful that you arrived just as the tornado season finished. Them buggers are real work.

        Thanks for the reply, BrowserUk.. yes, it fixes the cases that didnt work before.. (it was failing on 70000, 77000 and on 207000), those passed.. I'll collect as many more cases as I can from the logfile and throw them at my test script during the course of the day..

        in the interests of learning a bit more about how this works, could you please check if my decoding of the regex is correct.. ?

        What you're doing is anchoring to the start of a line (^), then you got the negation (?!), then you're doing OR based matches, with one crucial difference from what I was doing, you're anchoring EACH of the matches to the end of line char ($).. and then finally 1 or more digit chars are a positive match (\d+$)..

        Compared to my effort,

        /^(?!.*7000|7777|7778|3886200|2200|8488|3406|9100|29389988|7688|5000|2 +0|3408|3404|7648).*[0-9]+/)
        I didn't need the 0-9+, I could have simply used \d+ instead, I didnt need a .* before matching the trailing digits.. but err, one question, how does this work without a \d+ or a .* match in front ?

        Oh, and one more question, please.. where do I learn more about all this ? I have a browser window opened to the perlre manpage, and another to the CD bookshelf, are there any other references on this sort of thing ? (I have a copy of Jeffery Friedls regex book, but it sorta whizzed over my head) :(

      What kind of regex does the Radius server accept? Is the Radius server written in Perl? Because all regex are not equal. The basic syntax (.*+?[]) is pretty standard. The Perl 5 extensions are only available in Perl 5 and programs that use the PCRE library like Exim, Python, Apache, and PHP.

        Yes, this Radius server was written in Perl.. there was an article in perl.com about it recently, you can find it here..

        We run this under Perl 5.6.1, so it should accept anything that Perl 5.6.1 can handle, I think.. although their sample regex use some notations that I havent used before (ie: I learnt that ^ is a form of negation only after I read the docs)