Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

extract email addresses

by johnajb (Novice)
on Feb 18, 2005 at 23:57 UTC ( [id://432574]=perlquestion: print w/replies, xml ) Need Help??

johnajb has asked for the wisdom of the Perl Monks concerning the following question:

Hi I am trying to extract all email addresses from a text file. ( a ripped out mail header) What is the best module or way to do that? An example would help! thanks

Replies are listed 'Best First'.
Re: extract email addresses
by esskar (Deacon) on Feb 19, 2005 at 00:13 UTC
    I like Email::Find
    use Email::Find; # new object oriented interface my $finder = Email::Find->new(\&callback); my $num_found - $finder->find(\$text); # good old functional style $num_found = find_emails($text, \&callback); sub callback { my ($addrobj, $addrstr) = @_; print "$addrstr\n"; }
    Check the Email::Find pod for more detailed help!
Re: extract email addresses
by Roy Johnson (Monsignor) on Feb 19, 2005 at 00:11 UTC
    The Email::Address module should extract them.

    Caution: Contents may have been coded under pressure.
Re: extract email addresses
by sh1tn (Priest) on Feb 19, 2005 at 00:23 UTC
    use strict; my $type = shift || die "usage: filetype [directory]\n"; my $dir = shift || ""; my $mail = qr{\W*(\.*(?:\w+|-)+\.*\@\.*(?:\w+|-)+(?:\.\w+)+)\W*}; my @files = glob("$dir\*.$type"); my $mails; my $s_time = time; for( @files ){ open FH, "$_" or die "can't open $_\n"; my @cont = <FH>; close FH; for( @cont ){ if( /$mail/go ){ $mails->{$1} or $mails->{$1} = 1 } } } my $e_time = time; if( keys %$mails ){ print "Total time: ", $e_time - $s_time, "\n"; print "Total email addresses: ", scalar keys %$mails, "\n\n"; print "$_\n" for keys %$mails; }else{ print "No email address found\n" }


      all email addresses will be formatted like so.
      <smtp:"emailaddress">
      could be something.something.something@something.something.something.com
      but it wil always be in the brackets with smtp: in it.
        my $text = '<smtp:something.something.something@something.something.so +mething.com>'; my @addresses = $text =~ m!<smtp:(.*@.*)>!g; print "$_\n" foreach @addresses;
        But i still prefer my Email::Find solution; then you will be safe! <edit>miss-spelling fixed</edit>
        It does not matter - this regular expression takes care:
        $mail_reg = qr{\W*(\.*(?:\w+|-)+(?:\.\w+|-)*\@\.*(?:\w+|-)+(?:\.\w+)+ +)\W*}; $mail_1 = '<smtp:"email.address@something.something.something.com"> +'; $mail_2 = '<smtp:emailaddress@something.something.something.com>'; $mail_1 =~ m#$mail_reg# and print $1, $/; $mail_2 =~ m#$mail_reg# and print $1, $/; # which outputs: # email.address@something.something.something.com # emailaddress@something.something.something.com


          A reply falls below the community's threshold of quality. You may see it by logging in.
Re: extract email addresses
by TedYoung (Deacon) on Feb 19, 2005 at 00:10 UTC

    Well, you could scan over the file with a regex:

    use strict; use warnings; open F, 'file' or die $!; undef $\; # Enable slurp mode $_ = <F>; close F; print "$1\n" while /([\w.-]+\@[\w.-]+)/g;

    But there are many ways to do this.

    Update:Fixed my most glaring use strings; error. -- Ted, sitting here with egg on his face! :-)

    Ted Young

    ($$<<$$=>$$<=>$$<=$$>>$$) always returns 1. :-)
      which strings module should i use?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://432574]
Approved by Tanktalus
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others musing on the Monastery: (2)
As of 2024-04-26 05:44 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found