http://qs1969.pair.com?node_id=588095

brickwall has asked for the wisdom of the Perl Monks concerning the following question:

Hello, Hope someone can help. I need to find a regex to take a surname (any surname) from a form with <STDIN>, It must take "double-barrel" names into account (ie those with a space in the middle), but the whole string must be no longer than 20 characters. It must start with a letter and end with a letter with the space (or no space) appearing somewhere in the middle. It must also cater for normal single word surnames. This is what i have so far (but doesnt limit to 20 characters)
m/^([a-zA-Z]\s?[a-zA-Z]?){1,20}$/
I bet this is simple but for the life of me i cant figure out the regex i need. TIA

Replies are listed 'Best First'.
Re: Stumped by regex
by JediWizard (Deacon) on Dec 06, 2006 at 14:55 UTC

    Something like this? (It limits to strings no longer than twenty characters, nothing but letters and an optional space).

    m/^(?=.{1,20}\Z)[a-z]+(?:\s[a-z]+)?$/i

    In the end, however, I think you'd be better off using length to check the string's length outside the regex. Hope this is helpful.


    They say that time changes things, but you actually have to change them yourself.

    —Andy Warhol

      Except: -
      'David George' =~ m/^(?=.{1,20}\Z)[a-z]+(?:\s[a-z]+)?$/i;
      does not return 'George'?

        I guess I didn't realize he needed it too... But that is only a two character change.


        They say that time changes things, but you actually have to change them yourself.

        —Andy Warhol

Re: Stumped by regex
by reasonablekeith (Deacon) on Dec 06, 2006 at 15:05 UTC
    Does it _need_ to be one regex? You might be better of checking the length first? Something like this...?

    UPDATE: See replies....

    if (length($name) <= 20 and $name =~ m/^\w+\s(\w+\s)?\w+$/i) { print "okay\n"; }

    By the way, your character restriction doesn't work because you can match up to three characters in one go within the parentheses, so your upper limit here is actually 60.

    ---
    my name's not Keith, and I'm not reasonable.
      Great, but wouldnt your regex also allow numbers to be entered, which i dont want Thanks
        yup, dumb mistake by me. Also though, I assumed there was a first name here too. So the above regex allows an additional space that it shouldn't. You're probably okay with just the following...
        $name =~ m/^[a-z]+\s?[a-z]+$/i
        ...although this does force a minimum length of 2.

        Having said this, if I'd seen JediWizard's reply, I wouldn't have posted mine. I think he nailed it first time (++).

        ---
        my name's not Keith, and I'm not reasonable.
Re: Stumped by regex
by throop (Chaplain) on Dec 06, 2006 at 18:09 UTC
    Others are addressing the 'how' of a regex limiting the input to 20 alpha chars. Let me ask 'Why?'

    Why are you refusing registration to people with non-[A-za-z] chars in their names and people with long names? What do you want to have happen when the following people try to use your form:

Re: Stumped by regex
by themage (Friar) on Dec 06, 2006 at 15:01 UTC
    Hi brickwall,

    I think this would do what you want:
    m{\A[a-zA-Z][a-zA-Z\s]{0,18}[a-zA-Z]\Z} and (@a=m{\s}g)<=1
    The first regexp makes sure that you have at least two chars and up to 20, including spaces.

    That after and make sure that you have at most a space.

    TheMage
    Talking Web
Re: Stumped by regex
by bsdz (Friar) on Dec 06, 2006 at 15:03 UTC
    I'm not sure how to do this exactly with a regex but one could use something like: -
    use strict; my @names = ( 'Joe John Smith', 'David George', 'Emma Harry Sally Martin', ); foreach (@names) { my @w = split /\s/, (/\s+(\w[\w ]{0,18}\w)\s*$/)[0]; shift @w while @w > 2; print "@w\n"; }
    Update: You can replace everything in the foreach scope with: -
    (/\s+(\w[\w ]{0,18}\w)\s*$/)[0] =~ /(\w+\s?\w+)$/g;
Re: Stumped by regex
by l.frankline (Hermit) on Dec 06, 2006 at 15:06 UTC

    HI,

    If the length of the string is greater than 20 then that string is ignored and skipped to next string.

    while (<DATA>) { unless (length($_) > 20 ) { if ($_=~m#^[a-zA-Z]+(\s?[a-zA-Z]+)?$#) { print $_ . "\n"; } } } __DATA__ Johnson Larry Wall GeorgeWashingtonBushBush

    Results:

    Johnson
    Larry Wall

    Don't put off till tomorrow, what you can do today.

Re: Stumped by regex
by nevyn (Monk) on Dec 06, 2006 at 16:10 UTC

    Well, you have a lot of replies, and I think they all fail your spec. ... well done :).

    This is probably what I would do...

    sub surname { # Assumes ASCII locale $_ = shift; return undef if (length($_) > 20); return $_ if /^([a-zA-Z]+)$/; # Surname is only one return $_ if /^[a-zA-Z]+ ([a-zA-Z]+)$/; # Surname only return undef; }
    --
    And-httpd, $2,000 security guarantee
    James Antill
      Thanx for all the replies guys. Although some of them dont quite do it. In the end i figured out that the following will do the job (with the addition of some varification error messages for the user), and then check for length with an "if". Thanx guys
      m/^[a-zA-Z]+\s?[a-zA-Z]*$/