Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

pattern matching and sendmail issues

by csorensen (Beadle)
on Jun 29, 2000 at 01:31 UTC ( [id://20265]=perlquestion: print w/replies, xml ) Need Help??

csorensen has asked for the wisdom of the Perl Monks concerning the following question:

here's the problem (with all code - minus the comments) I have an html document with a few thousand email addresses in it. I need to extract these email addresses from the document and send an email to each address (not send one email and cc everyone). 2 issues: 1) I need a better pattern for email addresses but I'm very weak in syntax for regular expressions - is there a good place to learn more about regex ?? 2) the second script would run MUCH faster if I could put the open and close commands to sendmail outside the loop and just send an email to each address - I don't know how to send an eof to sendmail though .. whenever I move the open and close outside the loop sendmail creates an email message for each address in ONE message and sends that one message to the first address. very discouraging any ideas ?? please script 1 - get the addresses
open ADDLIST, "addlist" or die "can't open file: $!"; @names = <ADDLIST>; open NEWLIST, ">>emailist" or die "can't open file: $!"; foreach (@names) { if ( $_ =~ /([^\s\@]{1,}\@[^\s\@]{1,})/) { print NEWLIST $_; } }
script 2 - send the mail
$sendmail = "/usr/lib/sendmail -t"; open ADDRESS, "address.txt" or die "can't open file: $!"; @mail_to = <ADDRESS>; open BODY, "message.txt" or die "can't open file: $!"; $content = <BODY>; foreach (@mail_to) { open(SENDMAIL, "|$sendmail") or die "Cannot open $sendmail: $!"; print SENDMAIL "To: $_ \n"; print SENDMAIL "From: csorensen\@uptimeresources.net \n"; print SENDMAIL "Subject: South African tourism survey \n"; print SENDMAIL "Content-type: text/plain \n\n"; print SENDMAIL $content; close(SENDMAIL); }

Replies are listed 'Best First'.
Re: pattern matching and sendmail issues
by lhoward (Vicar) on Jun 29, 2000 at 01:53 UTC
    A simple regular-expression extractor for internet e-mail addresses is (not fully RFC compliant, but will handle %99 of the addresses you see out there):
    while($data=~/([\w.-]+\@(?:[\w.-]\.)+\w+)/gcs){ #e-mail address in $1 }

    If I were use I'd consider using the Mail::Bulkmail module to do the sending. It is designed for doing mass-mailings like you describe.

Re: pattern matching and sendmail issues
by btrott (Parson) on Jun 29, 2000 at 01:49 UTC
    Matching email addresses is difficult. But you're not actually trying to validate them, so you can probably afford to just "do your best", as it were :). This is the regexp used in Pod::HTML for matching email addresses; it's not going to catch everything, and it's probably going to wrongly match some addresses. But it may help.
    if ($word =~ /[\w.-]+\@\w+\.\w/) { # looks like an e-mail address
    This is used on an individual "word", where a word is obtained by splitting a string on /\s+/. So that's one example. If you look around a bit more, you can probably find others.

    For part 2 (sending the email)--if you're sending the same content to each of the addresses, then you could perhaps use Bcc to write all of the addresses to the message.

    for my $addr (@mail_to) { print SENDMAIL "Bcc: $addr\n"; } print SENDMAIL "From: csorensen\@uptimeresources.net \n"; print SENDMAIL "Subject: South African tourism survey \n"; print SENDMAIL "Content-type: text/plain \n\n"; print SENDMAIL $content;
Re: pattern matching and sendmail issues
by t0mas (Priest) on Jun 29, 2000 at 10:02 UTC
    THE internet email address matching regexp is found here. It is written by Jeffrey E. F. Friedl who also wrote the book Mastering Regular Expressions which is a good place to learn more about regex.

    /brother t0mas
      I picked up Mastering Regular Expressions on my way to work today. Thanks for the link to the regex!
Re: pattern matching and sendmail issues
by chromatic (Archbishop) on Jun 29, 2000 at 04:51 UTC
    It depends on the structure of the HTML file, but how about using a module like HTML::Parse or HTML::TokeParse to chop up the data file and return the addresses to you? You're more likely to go mad trying to write a regex to handle all of the possibilities.
Re: pattern matching and sendmail issues
by Anonymous Monk on Jun 29, 2000 at 01:38 UTC
    With all due respect, are you asking for advice on spamming? It kind of looks that way...
      no .. the south african tourism board has sent me a list of companies that participated in a trade show earlier this year. they want me to send a survey to all the participants to see what they thought of the show. the problem is .. they sent me WAY too much information in this file.. I just want to extract the email addresses from the file

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://20265]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (3)
As of 2024-04-19 05:31 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found