Fian has asked for the wisdom of the Perl Monks concerning the following question:

Hi Guys
I know that somewhere in here, as I was browsing one day, I came across
a problem that someone had with setting up a regular expression to match e-mail addresses.
Does anyone remember seeing this and do you have any idea where it was/is?
The problem I'm having is that I want to strip all the mail addresses from a file
(and there are loads of them) but they don't all follow the same format i.e.

xyz.xyz@xyz.com

I also get

xyz@xyz.com

Thanks
Fian....

Replies are listed 'Best First'.
Re: pattern match e-mail addresses
by arturo (Vicar) on Mar 22, 2001 at 20:52 UTC

    If you read those threads closely, then you saw a whole bunch of "don't do this with a regular expression" type posts. There is a module on CPAN called Email::Valid which does a pretty good job. Also, since you know the nodes are out there and know how to specify the content you want, you might try using the handy "Search" box that crops up at the top of every page or, failing that, Super Search (look underneath the chatterbox for the link).

    Let's try, at random, the following phrase match email address (follow that link and you'll get what you would have got had you typed that phrase into the search box).

    HTH! (I hope I don't sound like I'm lecturing; I really am doing what I think will help you most on this question and in the future.)

    handing out fishing poles since yesterday

Re: pattern match e-mail addresses
by Masem (Monsignor) on Mar 22, 2001 at 20:51 UTC
Re: pattern match e-mail addresses
by McD (Chaplain) on Mar 22, 2001 at 22:13 UTC
    I think the question is subtly different from what we've all been answering. Fian didn't ask how to validate an email address, but rather how to match an email address.

    In other words, given a file, print everything that looks like an email address. That's a subtly different question from "is this a valid mail address", or even from "Here's a line with a mail address in it, please parse out the relevant bits."

    Embedded newlines could wreak even more havoc.

    A lot depends on the dataset, and how you define what addresses you're looking for. For example, barewords are syntactically invalid mail addresses, but if I open an xterm and type "mail postmaster" it will probably get delivered.

    This is a stickier problem than it appears. You might try using the 822-valid beast from Mastering Regular Expressions, which you can find here.

    And please drop a note to tell us you're on the side of good, and aren't trying to scrape email addresses off a website or something.

    Peace,
    -McD

      There is a CPAN module that will match email addresses inside a string; Email::Find. Here's an example using this module:

      #!/usr/bin/perl -w use strict; use Email::Find qw(find_emails); use vars qw(@MATCHES); my $text = 'user@somewhere.com not an email address me@home.com'; find_emails($text, \&callback); print join "\n", @MATCHES; sub callback { my $email = shift; my $original = shift; push @MATCHES, $email->format; return $original; }

      Warning: the find_emails() routine will modify the original text. Make sure that you always return the original email text (as shown above), so that your input text does not mysteriously change.

      :-)
      It never even crossed my mind that u might think I was up to no good...
      This is part of a project I'm doing at work to take a load of registrations from a web form and parse the data into a comment delimited file for someone else to then read into an access database...
        First of all thank u for all the help.

        I now have a lot of stuff to keep me busy for a quite a while...
        This is THE best site I have found for all things Perly,

        Keep 'er Lit....

        Fian (Belfast, N.Ire.)

Re (PotPieMan): pattern match e-mail addresses
by PotPieMan (Hermit) on Mar 22, 2001 at 20:54 UTC