I often use plussed addresses (andrew+spam@afresh1.com as an example) and I wanted to get an idea on what addresses have received mail. I have addresses at several domains that end up in this mailbox, all ascii domains, thus the [-a-z0-9\.]+.
This is just throwaway code that looks through my email (in Maildir format), but wanted some monk wisdom on what I didn't think of that could make it faster. My assumption is that faster disks would be the biggest improvement, but what could I do for free?
#!/usr/bin/perl use strict; use warnings; use File::Find; use YAML::Syck; my %addresses; find(sub { return unless -f $_; open my $fh, '<', $_ or die; while (<$fh>) { last if $_ eq "\n"; # only scan headers $_ = lc $_; if (/\b(andrew\+[^\@\s]+\@[-a-z0-9\.]+)\b/xms) { my $addr = $1; # some addresses are in mailing list bounce format if ($addr =~ s/[=\#\%](?:3d)?/@/xms) { $addr =~ s/\@[^@]+$//xms; } $addresses{$addr}++; } } close $fh; }, glob($ENV{HOME} . '/Maildir/.misc*')); print Dump \%addresses;
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |