mkahn has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks, Here's yet another simple task that is confounding me. I sent out an emailer to a list of clients. It had been a while sine the owner had used the list, so not suprisingly, there were a lot of bounces. I congealed the bounced addresses into a file bad_emails from the address book, and wrote a little ditty to turn the paragraph style collage into a nice neat column. I also will have to do dup removals, once I get the (explicitive) thing to match more than one email address per line.

The next task is to compare the list to the original list and remove the matches: raw list - bad list = good list.

But before I can do that, I need to fix my regex, as it seems to only match the first one in each line. I obviously don't have the grasp of /g I thought I did.

push (@list, "$1\n") and $count++ if /\b([\w]+@[\w]+\.[\w]{3})\b/g;
Why am I only getting 1 match per line? Heres a smaple line from the data file:

wings@yahoo.com <wings@yahoo.com>; wine@hotmail.com <wine@hotmail.com>;

Replies are listed 'Best First'.
Re: scripty email address regex stuff
by chromatic (Archbishop) on Oct 15, 2003 at 20:59 UTC

    I think you mean while and not if, since there's a loop in the former and not in the latter. You might have more luck with the Mail::Address module, though.

      Thanks, chromatic. You were right on both counts. mail::address is pretty easy to use, though the output is not coming out the way I'd hoped. Am I doing this right?
      #!/usr/local/bin/perl use Mail::Address; my $file = shift; open (FILE, $file) or die "$@ Couldn't open file"; my @contents = <FILE>; foreach my $line (@contents) { my @addrs = Mail::Address->parse($line); foreach $addr (@addrs) { print $count++, $addr->address, "\n"; } }

      I'm looking for just the address (and the count), but names, commas, and semicolons are coming out as individual results.

Re: scripty email address regex stuff
by Abigail-II (Bishop) on Oct 15, 2003 at 21:30 UTC
    I typically create dedicated addresses when I sign up to something of the form $name$@abigail.nl. Perfectly legal address and it seldom leads to problems.

    But your regex doesn't match it.

    Abigail

Re: scripty email address regex stuff
by Beechbone (Friar) on Oct 16, 2003 at 11:38 UTC
    Do you want to parse the addresses or just extract them from the file? In the second case, why not just:
    local $/ = undef; my $raw = <FILE>; $raw =~ s/[\r\n]//g; $raw =~ s/;\s+//g; my @list = split /;/, $raw;
    and maybe:
    foreach (@list) { s/^[^<>]+<([^<>]+)>$/$1/; # or: s/^[^<>]+<([^<>\s]+@[^<>\s]+)>$/$1/; }
    but the second regex also throws away some legal addesses like "user!host1!host2!host3"

    Search, Ask, Know
      Did you actually try this
        Sed and uniq were really helpful for creating the final list