Common Regex's for your coding pleasure.
$NAME_REGEX = '[a-zA-Z]+'; $ADDRESS_REGEX = '[.-\w]+'; $CITY_REGEX = '\w+'; $PHONE_REGEX = '^1?[-\s\.\/]?\s*\(?(\d\d\d)\)?[-\s\.\/]?\s*(\d\d\d) +?[-\s\.\/]?(\d\d\d\d)\s*$'; $ZIP_REGEX = '^\d{5}(([-\s])?\d{4})?\s*$'; $DOB_REGEX = '^\d?\d[-\s\.\/]\s*\d?\d[-\s\.\/]\d\d\d?\d?s*$'; $EMAIL_REGEX = '[-.\w]+\@[-.\w]+'; $COMMENT_REGEX = '\w+'; $SSN_REGEX = '^\d\d\d([\s\.\/-])?\d\d([\s\.\/-]>?\d\d\d\d\s*$'; $HTTP_CLEAN_HEADER_REGEX =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex( +$1))/eg; #i.e., get rid of %20 et. al nauseaum from post headers.

Replies are listed 'Best First'.
RE: Common Regex's
by merlyn (Sage) on Aug 07, 2000 at 19:27 UTC
    The $EMAIL_REGEX is very wrong. The right one (for some version of "right") is over 6000 characters. Please don't use the one given here for ANYTHING.

    -- Randal L. Schwartz, Perl hacker

RE: Common Regex's
by KM (Priest) on Aug 07, 2000 at 19:53 UTC
    You can find the script which generates the pattern in Mastering Regular Expressions chapter 7 or here. Just add a print $mailbox at the end and redirect to a new file. If there is anything out there newer than this, let us know.

    Cheers,
    KM

      Or you can just use this. This came from the same book as KM mentioned above.

      [\040\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\ +xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-\ xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*(?:(?:[^(\040) +<>@,;:".\\\[\]\000-\037\x80-\xff]+(?![^( \040)<>@,;:".\\\[\]\000-\037\x80-\xff])|"[^\\\x80-\xff\n\015"]*(?:\\[^ +\x80-\xff][^\\\x80-\xff\n\015"]*)*")[\04 0\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\ +n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\ n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*(?:\.[\040\t]*(?:\ +([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\ xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]* +)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t] *)*(?:[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+(?![^(\040)<>@,;:".\\\ +[\]\000-\037\x80-\xff])|"[^\\\x80-\xff\n \015"]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015"]*)*")[\040\t]*(?:\([^\\\x +80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\( [^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^ +\\\x80-\xff\n\015()]*)*\)[\040\t]*)*)*@[ \040\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\x +ff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-\x ff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*(?:[^(\040)<>@, +;:".\\\[\]\000-\037\x80-\xff]+(?![^(\040 )<>@,;:".\\\[\]\000-\037\x80-\xff])|\[(?:[^\\\x80-\xff\n\015\[\]]|\\[^ +\x80-\xff])*\])[\040\t]*(?:\([^\\\x80-\x ff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80- +\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x8 0-\xff\n\015()]*)*\)[\040\t]*)*(?:\.[\040\t]*(?:\([^\\\x80-\xff\n\015( +)]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\ n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\ +015()]*)*\)[\040\t]*)*(?:[^(\040)<>@,;:" .\\\[\]\000-\037\x80-\xff]+(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff +])|\[(?:[^\\\x80-\xff\n\015\[\]]|\\[^\x8 0-\xff])*\])[\040\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]| +\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xf f][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*)* +|(?:[^(\040)<>@,;:".\\\[\]\000-\037\x80- \xff]+(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff])|"[^\\\x80-\xff\n\0 +15"]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\01 5"]*)*")[^()<>@,;:".\\\[\]\x80-\xff\000-\010\012-\037]*(?:(?:\([^\\\x8 +0-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([ ^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\ +\\x80-\xff\n\015()]*)*\)|"[^\\\x80-\xff\ n\015"]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015"]*)*")[^()<>@,;:".\\\[\]\ +x80-\xff\000-\010\012-\037]*)*<[\040\t]* (?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015 +()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015 ()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*(?:@[\040\t]*(?:\([^\\\ +x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\ ([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))[ +^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*(?: [^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+(?![^(\040)<>@,;:".\\\[\]\00 +0-\037\x80-\xff])|\[(?:[^\\\x80-\xff\n\0 15\[\]]|\\[^\x80-\xff])*\])[\040\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?: +\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]* (?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)* +\)[\040\t]*)*(?:\.[\040\t]*(?:\([^\\\x80 -\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x +80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\ \x80-\xff\n\015()]*)*\)[\040\t]*)*(?:[^(\040)<>@,;:".\\\[\]\000-\037\x +80-\xff]+(?![^(\040)<>@,;:".\\\[\]\000-\ 037\x80-\xff])|\[(?:[^\\\x80-\xff\n\015\[\]]|\\[^\x80-\xff])*\])[\040\ +t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[ ^\x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\ +015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[ \040\t]*)*)*(?:,[\040\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\x +ff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80 -\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]* +)*@[\040\t]*(?:\([^\\\x80-\xff\n\015()]* (?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x8 +0-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015 ()]*)*\)[\040\t]*)*(?:[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+(?![^( +\040)<>@,;:".\\\[\]\000-\037\x80-\xff])| \[(?:[^\\\x80-\xff\n\015\[\]]|\\[^\x80-\xff])*\])[\040\t]*(?:\([^\\\x8 +0-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([ ^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\ +\\x80-\xff\n\015()]*)*\)[\040\t]*)*(?:\. [\040\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\ +xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-\ xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*(?:[^(\040)<>@ +,;:".\\\[\]\000-\037\x80-\xff]+(?![^(\04 0)<>@,;:".\\\[\]\000-\037\x80-\xff])|\[(?:[^\\\x80-\xff\n\015\[\]]|\\[ +^\x80-\xff])*\])[\040\t]*(?:\([^\\\x80-\ xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80 +-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x 80-\xff\n\015()]*)*\)[\040\t]*)*)*)*:[\040\t]*(?:\([^\\\x80-\xff\n\015 +()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff \n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n +\015()]*)*\)[\040\t]*)*)?(?:[^(\040)<>@, ;:".\\\[\]\000-\037\x80-\xff]+(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\ +xff])|"[^\\\x80-\xff\n\015"]*(?:\\[^\x80 -\xff][^\\\x80-\xff\n\015"]*)*")[\040\t]*(?:\([^\\\x80-\xff\n\015()]*( +?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\01 5()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015( +)]*)*\)[\040\t]*)*(?:\.[\040\t]*(?:\([^\ \\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\ +\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\) )[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*(?:[^(\040)<>@,;:".\\\[\]\000-\ +037\x80-\xff]+(?![^(\040)<>@,;:".\\\[\]\ 000-\037\x80-\xff])|"[^\\\x80-\xff\n\015"]*(?:\\[^\x80-\xff][^\\\x80-\ +xff\n\015"]*)*")[\040\t]*(?:\([^\\\x80-\ xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80 +-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x 80-\xff\n\015()]*)*\)[\040\t]*)*)*@[\040\t]*(?:\([^\\\x80-\xff\n\015() +]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n \015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\0 +15()]*)*\)[\040\t]*)*(?:[^(\040)<>@,;:". \\\[\]\000-\037\x80-\xff]+(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff] +)|\[(?:[^\\\x80-\xff\n\015\[\]]|\\[^\x80 -\xff])*\])[\040\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\ +([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff ][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*(?: +\.[\040\t]*(?:\([^\\\x80-\xff\n\015()]*( ?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80 +-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015( )]*)*\)[\040\t]*)*(?:[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+(?![^(\ +040)<>@,;:".\\\[\]\000-\037\x80-\xff])|\ [(?:[^\\\x80-\xff\n\015\[\]]|\\[^\x80-\xff])*\])[\040\t]*(?:\([^\\\x80 +-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^ \\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\ +\x80-\xff\n\015()]*)*\)[\040\t]*)*)*>)

      LeGo <hr