blackjudas has asked for the wisdom of the Perl Monks concerning the following question:

I am putting together some data filters for a database system. These filters are intended to check for "correctness" of the data, and perform an action if the data isn't "right". Now I use quotes on "correct" because as in life, there are always exceptions, now to a specific example. If the filter to be applied is for last names, then check to see if the first letter is capitalized, if not, use ucfirst and return the result. Now, the regex for that eludes me as I was perusing perlre, all I found was how to _convert_ to uppercase (next char, next word... etc) but I just want to check the first char not change it. Reason being, in cases such as McKenzie, FiFins, Macdonald etc... are all special cases, rather than trying to think of absolutely every special case under the sun, I choose to look for the simple way out. I'm already looking forward to a pretty obvious answer since my last few posts have gotten the same response. :-) Oh, can someone point me to telephone number storage/display methods? There must be some way of looking at a number and checking it's format validity? Thanks.

Replies are listed 'Best First'.
Re: A question of logic and syntax
by repson (Chaplain) on Apr 28, 2001 at 06:15 UTC
    Don't try to use a simple ucfirst to 'fix' names since for many names such as "McCarthy O'Neil", and other complex names will be broken. However the CPAN module Lingua::EN::NameCase is designed to use all the special rules to create the right case.
    $name = nc($name); # change if ($name eq nc($name)) {} # test
Re: A question of logic and syntax
by traveler (Parson) on Apr 27, 2001 at 22:35 UTC
    I think that part of the problem is that US numbers are today often written 555/555-1212 instead of with parens. If you do decide to force the use to put the US Area Code in parens, please, please tell the user. I hate to see an input box on a web form for telephone number, enter 555/555-1212, and get some javascript error that I cannot decypher or perhaps some rejection from a CGI script that the number is incorrect. If you intend to force a format, please let the user know.</rant>
      Interesting, as I've never encountered the / version and it is not included in Number::Phone::US either. What are people thinking? There's no farging slashes, brackets or spaces on a telephone anyway! </rant> ;-)

      --
      I'd like to be able to assign to an luser

      Present the user with a localized form taht allows himm to enter his telephone number in a format he is used to but store these telephone numbers in a format that applies for all countries.

      Have a nice day
      All decision is left to your taste
Re: A question of logic and syntax
by Trinary (Pilgrim) on Apr 27, 2001 at 22:20 UTC
    $lastname = ucfirst ($lastname) if ($lastname =~ /^[a-z]/);

    as far as the telephone number thing, it depends on how many formats you need to handle (international numbers, etc) Just choose a set of standards and enforce 'em, for example US numbers all look like (123) 456-7890.

    So remove whitespace, check for 3 in parens, 3 outside, dash, then 4. If not, check to see if there is the correct number of digits, and if so, split em up and add the extra punctuation yourself. You just need to adopt a similar standard for any number format you might be handling (and extend the US one for extensions, etc).

    Trinary

      ok, but the world is bigger then the US, so betetr get used to the format +CC -AC -number -ext, where the plus implies the numbers you have to dial for international calls which you can ommit if you make an national call, CC is the country code, AC the Area code and the number is the number itself and ext an optional extension.
      If you apply that scheme you don't have to redefine your DB, modify your .ini sections or worst case to rewrite your code all times you have people from other countries. And US and CAnada schemes might be pretty similar but they are not equal, so better take an open format.
      You might also consider to include something alike a code you have to dial before the area code if you make a national call, eg. +CC -(n)AC -number -ext.

      Have a nice day
      All decision is left to your taste
      A little faster. To see how much (quite a bit) you might want to try Benchmark.pm:

      $lastname =~ s/^([a-z])/\u$1/;


      Or if you want a cool one that will init cap an entire string (but without the eval overhead of ucfirst):
      $txt =~ s/\b(\w+)/\u\L$1/g;


      my @a=qw(random brilliant braindead); print $a[rand(@a)];
Re: A question of logic and syntax
by Albannach (Monsignor) on Apr 27, 2001 at 22:56 UTC
    Maybe I'm mis-reading this but it sounds like you want to check the first character, and if it isn't uppercase, then make it uppercase. Though you list special cases, I can't see a case in which you would not make the first character uppercase, so why are you checking it at all? Just use ucfirst() (which leaves all the trailing characters unchanged) and be done with it, saving yourself the cycles needed for the test. Of course there are plenty of last names that start with lower case letters, e.g. von Neuman, so this is always going to be only a filter, not a test.

    --
    I'd like to be able to assign to an luser

Re: A question of logic and syntax
by princepawn (Parson) on Apr 27, 2001 at 23:40 UTC
    If you come up with some new regular expressions that will see heavy usage by Perl programmers consider submitting them to the Regexp::Common distribution.
Re: A question of logic and syntax
by jeroenes (Priest) on Apr 27, 2001 at 22:24 UTC
    Funny node, I must say.

    Did you try to find an answer yourself? Does not really look like it. But I'm in a good mood, and killing time until my wife comes home....

    To answer your first question: substr.
    Nice, isn't it?

    print "'$str' starts with lowercase\n!" if substr($str,0,1) =~ /[a..z] +/;

    For the second, where do you live? What kind of phonenumbers will you accept? International?

    print "Only -+()0..9 allowed!\n" if $str =~ /[^+-()0..9]/;

    /me hopes this helps you learn....

    Jeroen
    "We are not alone"(FZ)

      Thanks for the wit :)

      I was stuck in regex land, looking for a pattern to match the first Uppercase char.

      I must also clarify the telephone number question, I'm looking for a "rulebook" for international numbers so I can compensate for different locations, the NA number format is fairly straight-forward but european countries and others don't seem to conform, ie. in the UK an area code could be one or 4 numbers, depending on location.

      And yes, I'll always be learning... this post was just a "I'm at a loss, and need to ask a fellow perl programmer" I'll admit the answer was simple. And thank you for yours.