Speedfreak has asked for the wisdom of the Perl Monks concerning the following question:

Yes, I confess I'm two clowns short of a circus.

I'm trying to write a search script for a small CSV database. Its not complex enought to warrant SQL.

Basically, I want to check the incoming string from the form against the following logic:

I want to be able to make a true/false against this to either continue the search script or warn the user to put more text into the search.

A generic regex that you can add more allowed charachters to and alter the min/max size would be good as I think this gets used a lot.

- Jed

  • Comment on Makeing rules in Regex for string content and size

Replies are listed 'Best First'.
Re: Making rules in Regex for string content and size
by Corion (Patriarch) on Jun 05, 2000 at 14:05 UTC

    Such a regular expression is quite simple to make (see perlre for a long explanation). First we start with the RE for one single allowed character. To match a set or a range of characters, we must use the [ and ] chars :

    [A-Z0-9 ,"&-']
    Now we want this RE to match at least four and at maximum 16 characters :
    /^[A-Z0-9 ,"&-']{4,16}$/
    The braces tell Perl to match the previous part of the RE (our set of valid characters) at least 4 times and at maximum 16 times.

    There are of course other ways, for example the more clumsy way of doing it like this :

    # Please excuse the long line ;) /^[A-Z0-9 ,"&-'][A-Z0-9 ,"&-'][A-Z0-9 ,"&-'][A-Z0-9 ,"&-'][A-Z0-9 ," +&-']?[A-Z0-9 ,"&-']? ... repeat total 12 times ... [A-Z0-9 ,"&-']?$/
    Here, the ? tells Perl to optionally match the previous set, in fact, the ? is equivalent to {0,1}.

    If you find yourself struggling with REs often, you might want to take a look at Mastering Regular Expressions by Jeffrey Friedel (It is published by O'Reilly, and has a big owl on the cover) - this book covers everything about REs, but I don't know how good it is for beginners.

    Update: Here is a direct link to the book.

    Update: perlcgi told me that lowercase characters should be allowed as well. D'oh ;) Here we go again :

    # The set for the characters gets changed : [A-Za-z0-9 ,"&-'] # and the RE then looks like this : /^[A-Za-z0-9 ,"&-']{4,16}$/ # and we can modify the RE, so that A-Z also matches # a-z with the "i"-modifyer : /^[A-Z0-9 ,"&-']{4,16}$/i # or equivalent /^[a-z0-9 ,"&-']{4,16}$/i

      You should escape the - in the character class, or it will match all characters from & to ' which is not really what you want, is it?
      [A-Z0-9 ,"&\-']
      would be the correct form if I am right.
        You can just stick it anywhere that it won't be interpreted as part of a range so:
        [-A-Z0-9 ,"&'] or even [A-Z-0-9 ,"&']
        Will work

        Nuance

Re: Makeing rules in Regex for string content and size
by perlcgi (Hermit) on Jun 05, 2000 at 14:20 UTC
    A slightly shorter version of corion's nice explanation above would be:
     /^[\w ,"&-']{4,16}$/ Updated: Caveat - This will also allow underscores which is probably not what you want. Kudos to corion
      then why not just do: /^A-Z0-9 ,"&-'{4,16}$/i in the first place
      then why not just do: /^[A-Z0-9 ,"&\-']{4,16}$/i in the first place
Re: Makeing rules in Regex for string content and size
by antihec (Sexton) on Jun 05, 2000 at 14:11 UTC
    Could you show us what you tried, and why that won't work?
    I think pointing you to perlman:perlre is good enough for a start, else you won't learn too much :-)
    -- bash$ :(){ :|:&};: