in reply to New Module Consideration?

Use a built-in check for valid email

I quote from perlfaq9:

Without sending mail to the address and seeing whether there's a human on the other hand to answer you, you cannot determine whether a mail address is valid. Even if you apply the mail header standard, you can have problems, because there are deliverable addresses that aren't RFC-822 (the mail header standard) compliant, and addresses that aren't deliverable which are compliant.

Many are tempted to try to eliminate many frequently-invalid mail addresses with a simple regex, such as /^[\w.-]+\@(?:[\w-]+\.)+\w+$/. It's a very bad idea. However, this also throws out many valid ones, and says nothing about potential deliverability, so it is not sug- gested. Instead, see http://www.cpan.org/authors/Tom_Christiansen/scripts/ckaddr.gz, which actually checks against the full RFC spec (except for nested comments), looks for addresses you may not wish to accept mail to (say, Bill Clinton or your postmaster), and then makes sure that the hostname given can be looked up in the DNS MX records. It's not fast, but it works for what it tries to do.

The RFC compliancy test is nice, but allows more than most people want to. What kind of test does your built-in do?

- Yes, I reinvent wheels.
- Spam: Visit eurotraQ.

Replies are listed 'Best First'.
Re^2: New Module Consideration?
by Aristotle (Chancellor) on Jan 01, 2003 at 02:39 UTC
      Do you, by any chance, know of a way to get Email::Valid to work on Win32? I have been unable to locate a ppm or use CPAN to install it myself (For that matter, I can't seem to get Net::DNS to install either.)



      My code doesn't have bugs, it just develops random features.

      Flame ~ Lead Programmer: GMS | GMS

        I've just used ppm to install Net::DNS on winXP - no problem. Downloaded the source for Email::Valid as Valid.pm into directory C:\Perl\site\lib\Mail. Then tried a simple test of valid/invalid emails -worked.
        Couldn't get a connection to use the 'check this is a valid host' test though probably I'm not set up correctly or firewall or something.
        I think you have to run netlibcfg.pl in (c:\perl\bin or wherever.. ) to set up the connections. HTH
        poj
        Sorry, I have no idea. :-/ I haven't used a Win32 Perl in one and a half years (can't say I miss it either).

        Makeshifts last the longest.

      Also: RFC::RFC822::Address.

      Abigail

Re^2: New Module Consideration?
by diotalevi (Canon) on Jan 01, 2003 at 15:45 UTC

    Nested comments happen in practice so if that code doesn't handle them then it's not powerful enough for real-world use. Anyone know some code that actually handles addresses fully?

    Update Or maybe that's Aristotle's Email::Valid. I am so tired this morning


    Fun Fun Fun in the Fluffy Chair

      Umm, can you clarify what you meant there? What nested comments?



      My code doesn't have bugs, it just develops random features.

      Flame ~ Lead Programmer: GMS | GMS

        Comments as a structure may be inserted anywhere within an address though I only ever see them on the ends. Comments are delimited by matching parentheses pairs. So (this (is) a) nested comment. Here is a sanitized version of my e-mail address test set. [Added inline It does not address quoted domains, internal comments or non-ASCII 8-bit characters. A full validator should probably at least allow for Unicode and iso-8859-?. I've never seen quoted domains or internal comments so that's likely just something that is allowed but no one uses these days.]

        "Cardamom" cardamom@spice.com This address is not RFC822 compliant. The address@company.com portion should be either be enclosed in <> angle brackets or the double-quote construct should be replaced with a (Joshua Jore) structure. A validator should still be able to extract the machine readable address.

        "Ginger" <ginger@spice.com>: This is the most common format and correctly delimits the machine-readable portion from everything else in the field.

        (Lemon Peel) lpeel@spice.com This is also correct and occurs in practice. In this case the entire string is taken to be the machine-readable portion after the with the comment construct is removed.

        (Orange Zest) <ozest@spice.com>: Pretty normal - a comment and a machine readable portion.

        "Red (hot!) pepper" rhpepper@spice.com: Broken and not RFC822 compliant. Your validator should distinguish the extraneous (but not commented) text from the machine parsable address.

        (Black (and white) pepper) <bpepper@spice.com>: A normal address.

        "Fish Oil" (foil@yahoo.com) <foil@spice.com>: Again normal, the machine parsable portion is expicitly noted so everything else can just be ignored. This means foil@yahoo.com is not the address and must be correctly distinguished.

        Bug Blatter (beast@trall.com) <gbeans@spice.com>: Ditto. This is an extension on the previous example.

        "Bug Blatter"@trall.com: This is tricky for some validators to handle though in this case the entire machine-readable portion includes the double-quoted region with the space. This is a great demonstration that you can't just split on white space and look for words with @ symbols.


        Fun Fun Fun in the Fluffy Chair

Re: Re: New Module Consideration?
by Flame (Deacon) on Dec 31, 2002 at 23:32 UTC
    I would be considering several options, most likely a simple compliance check, but since it has not yet been written, I can't say for sure. But since I'm asking if I should at all, I am, of course, open to suggestions as to how to do it.



    My code doesn't have bugs, it just develops random features.

    Flame ~ Lead Programmer: GMS | GMS