in reply to Extracting email addresses

There are several problems with your code, for example that you search for only one email address per line, and that your definition of an email address is quite questionable.

You can use CPAN modules like Email::Find, which do most of the work for you, and probably don't suffer from these problems.

Replies are listed 'Best First'.
Re^2: Extracting email addresses
by JavaFan (Canon) on Dec 25, 2011 at 09:34 UTC
    The OP actually splits each line on whitespace, and tests each chunk.

    Unfortunally, email addresses can contain whitespace. For instance <foo @ example.com> is a legal way of formatting an address - and so are <"bar baz"@example.com> and <quux(Yes, that is me!)@example.com>. Of course, the email addresses of the OPs data may be whitespace free.

    Anyway, the Perl tarbal actually comes with a regular expressions to match email addresses. It's found in the test suite (t/re/reg_email.t), and requires 5.10:

    my $email = qr { (?(DEFINE) (?<address> (?&mailbox) | (?&group)) (?<mailbox> (?&name_addr) | (?&addr_spec)) (?<name_addr> (?&display_name)? (?&angle_addr)) (?<angle_addr> (?&CFWS)? < (?&addr_spec) > (?&CFWS)?) (?<group> (?&display_name) : (?:(?&mailbox_list) | (?& +CFWS))? ; (?&CFWS)?) (?<display_name> (?&phrase)) (?<mailbox_list> (?&mailbox) (?: , (?&mailbox))*) (?<addr_spec> (?&local_part) \@ (?&domain)) (?<local_part> (?&dot_atom) | (?&quoted_string)) (?<domain> (?&dot_atom) | (?&domain_literal)) (?<domain_literal> (?&CFWS)? \[ (?: (?&FWS)? (?&dcontent))* (?& +FWS)? \] (?&CFWS)?) (?<dcontent> (?&dtext) | (?&quoted_pair)) (?<dtext> (?&NO_WS_CTL) | [\x21-\x5a\x5e-\x7e]) (?<atext> (?&ALPHA) | (?&DIGIT) | [!#\$%&'*+-/=?^_`{|} +~]) (?<atom> (?&CFWS)? (?&atext)+ (?&CFWS)?) (?<dot_atom> (?&CFWS)? (?&dot_atom_text) (?&CFWS)?) (?<dot_atom_text> (?&atext)+ (?: \. (?&atext)+)*) (?<text> [\x01-\x09\x0b\x0c\x0e-\x7f]) (?<quoted_pair> \\ (?&text)) (?<qtext> (?&NO_WS_CTL) | [\x21\x23-\x5b\x5d-\x7e]) (?<qcontent> (?&qtext) | (?&quoted_pair)) (?<quoted_string> (?&CFWS)? (?&DQUOTE) (?:(?&FWS)? (?&qcontent +))* (?&FWS)? (?&DQUOTE) (?&CFWS)?) (?<word> (?&atom) | (?&quoted_string)) (?<phrase> (?&word)+) # Folding white space (?<FWS> (?: (?&WSP)* (?&CRLF))? (?&WSP)+) (?<ctext> (?&NO_WS_CTL) | [\x21-\x27\x2a-\x5b\x5d-\x7e +]) (?<ccontent> (?&ctext) | (?&quoted_pair) | (?&comment)) (?<comment> \( (?: (?&FWS)? (?&ccontent))* (?&FWS)? \) ) (?<CFWS> (?: (?&FWS)? (?&comment))* (?: (?:(?&FWS)? (?&comment)) | (?&FWS))) # No whitespace control (?<NO_WS_CTL> [\x01-\x08\x0b\x0c\x0e-\x1f\x7f]) (?<ALPHA> [A-Za-z]) (?<DIGIT> [0-9]) (?<CRLF> \x0d \x0a) (?<DQUOTE> ") (?<WSP> [\x20\x09]) ) (?&address) }x;