Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Please clarify what this reg expression is doing:
/^[\w.-]+@myinfo.com/
It is looking for a start word then a '.' then not sure what the minus '-' is doing? And the '+' is zero or more matches?? Can someone clarify all this???

Replies are listed 'Best First'.
Re: Req expression translation
by dbwiz (Curate) on Dec 12, 2003 at 13:01 UTC

    It is supposed to catch an email address ending with "myinfo.com". However, it will also catch horrible things like "....@myinfozcom" (the dot matches everything except a newline!)

    perl -MYAPE::Regex::Explain -e 'print YAPE::Regex::Explain->new(qr/^[ +\w.-]+\@myinfo.com/)->explain' The regular expression: (?-imsx:^[\w.-]+\@myinfo.com) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- ^ the beginning of the string ---------------------------------------------------------------------- [\w.-]+ any character of: word characters (a-z, A- Z, 0-9, _), '.', '-' (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- \@ '@' ---------------------------------------------------------------------- myinfo 'myinfo' ---------------------------------------------------------------------- . any character except \n ---------------------------------------------------------------------- com 'com' ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------

    If you need to validate an email address, you are better off with Email::Valid.

Re: Req expression translation
by Taulmarill (Deacon) on Dec 12, 2003 at 13:02 UTC
    this regex is seaching for word caracters points and minuses in anny order whith zero or more matches at the begining of the string (look at the []) folowed by "@myinfo.com" where i would use "\Q@myinfo.com\E" to avoid strange behavior from the '@' and the '.'.
Re: Req expression translation
by vacant (Pilgrim) on Dec 12, 2003 at 13:05 UTC
    It looks like there are some errors in that re. The caret "^" anchors the match to the beginning of the string. The characters inside the square brackets are a character class, which in this case should match any word character "\w", a dot, or a hyphen. However, when used this way, the hyphen should be first in the class. I believe the dot must be escaped even inside a chracter class, but I am not certain. The "+" following the character class means "match this character one or more times". The remaining characters would match a trailing "@myinfo.com", but the dot must be escaped for it to work.

    update - The "@" must also be escaped, lest perl interpret it as a variable.

      Thanks!!
Re: Req expression translation
by davido (Cardinal) on Dec 12, 2003 at 16:54 UTC
    It's essentially a broken attempt to match an email address ending in "@myinfo.com"

    Let's start with the "@myinfo.com/" part. The dot is going to match any character, so it could, in fact, match "@myinfo9com" At least the dot and @ should be escaped like this: "\@myinfo\.com/"

    The other problem is the [\w.-]+ character class, which will let in any amount of dots or hyphens or word characters, in any order. Imagine an email address like this: "---...@myinfo.com". This RE will allow that.

    The RE would be more effective like this:

    /^(?:[\w\d]+[.-]?)+\@myinfo\.com$/

    While that might still reject a potentially valid address, or accept a potentially invalid address, at least it's limited to one server, and will be acceptible 99% of the time.


    Dave

      An even better way could be:
      use Email::Valid; unless ( $thing_to_check =~ /\@myinfo\.com/ && Email::Valid->address($thing_to_check) ) { print "'$thing_to_check' is not valid!\n"; }

      ------
      We are the carpenters and bricklayers of the Information Age.

      Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified.

Re: Req expression translation
by dragonchild (Archbishop) on Dec 12, 2003 at 13:37 UTC
    [] indicate a character class. For example, /[24680]/ would match an even number, but not an odd number. '+' is one or more matches. '*' is 0 or more. Read perlre for more info.

    ------
    We are the carpenters and bricklayers of the Information Age.

    Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified.

Re: Req expression translation
by TomDLux (Vicar) on Dec 12, 2003 at 22:28 UTC

    The Regular Expression book has a regex to validate email addresses. The only reason it fits on a single page is because they used a small font in printing it.

    The question is not whether you will need to compromise, but how much you will compromise.

    --
    TTTATCGGTCGTTATATAGATGTTTGCA