CColin has asked for the wisdom of the Perl Monks concerning the following question:

Hi all I'm trying to solve what sounds like a simple problem: Find all unique email addresses in my google apps gmail account. However, have tried various gmail modules on the cpan without much success - and that's purely to download the corpus of messages. Once I have the corpus I'm more hopeful on parsing unique addresses. Before I start again from scratch, is there a well used gmail module that I am missing out there?

Replies are listed 'Best First'.
Re: correct gmail module?
by CSJewell (Beadle) on Apr 24, 2009 at 04:54 UTC
    Considering that Gmail is an IMAP provider, I'd use whatever IMAP module is the best for your purposes, rather than trying to screen-scrape Gmail. I'll reply with a quickie Net-IMAP script that should point you in the right direction in just a bit.
      Try Mail-IMAPClient and IO::Socket::SSL instead (sorry about bad module choice earlier):
      use strict; use warnings; require IO::Socket::SSL; require Mail::IMAPClient; # Gmail does not provide a non-SSL IMAP server. my $socket = IO::Socket::SSL->new("imap.gmail.com:993") my $imap = Mail::IMAPClient->new( Server => 'imap.gmail.com' User => 'you@gmail_labs.com', Password=> $pass, Socket => $socket, ) or die "Cannot connect to $host as $id: $@"; my %address_list; my @messages = $imap->messages(); foreach my $message (@messages) { my $envelope = $imap->get_envelope($message); my $from = $envelope->from(); foreach my $from_address (@{$from}) { $address_list{$from_address} = 1; } my $to = $envelope->to(); foreach my $to_address (@{$to}) { $address_list{$to_address} = 1; } my $cc = $envelope->cc(); foreach my $cc_address (@{$cc}) { $address_list{$cc_address} = 1; } my $replyto = $envelope->replyto(); foreach my $replyto_address (@{$replyto}) { $address_list{$replyto_address} = 1; } my $sender = $envelope->sender(); foreach my $sender_address (@{$sender}) { $address_list{$cc_address} = 1; } } $imap->logout(); my @address_list = keys %address_list; ...

      Note that this will only catch message in the root IMAP folder ATM - although it can be easily extended. But then, does GMail have real "folders" as such?

        As I said, this was a quickie. It's $imap->to_addresses you'll need to use, rather than $imap->to, and so on. (the second returns parsed objects, the first returns e-mail addresses as strings)
        Thanks - looks nice; it was the modules with gmail in the name that I was having trouble with.
Re: correct gmail module?
by ELISHEVA (Prior) on Apr 24, 2009 at 04:43 UTC

    Maybe you could tell us what you have tried?

    • Which specific CPAN modules did you try?
    • What specific problems did you see?
      • Did the modules download? Did they install?
      • Could you access them in your scripts? Are they on your include path? (if not, check the value of @INC)
      • If so, did the script connect to Google's website?
    • What specific error messages did you see at the point where things stopped working?
    • If you were able to install the module and use the module in your code, what does your code look like? Could you post a sample?
    • Finally, what communication protocols do the modules you've tried rely on? Are you sure your system's firewall is set up to allow outgoing and incoming messages using that protocol?

    Best, beth