alfonso has asked for the wisdom of the Perl Monks concerning the following question:

Hi there,

I couldn't find any similar question.

I want to download all my gmail emails (sent + inbox) and parse each of them as they get downloaded (without storing them, I don't have enough space) and do some analysis on headers + body (not attachments).

I simply want to scrape all email addresses in each email and keep track of the most recent date I had contact with that email and count instances of that particular email address (to easily find out which addresses I contact more often). Maybe two separate counters for inbox and sent.

Bonus: being able to save the string next to the email when present. e.g. xdg@yahoo.com is actually "John Doe"

I would appreciate some high level pointers to appropriate libraries.

Please note that the key is that I can't store so many emails (not enough disk space), so the process needs to be done without storage. Load each email and then flush it from memory.

I coded in Perl around 10 years ago, so I'm quite rusty, but I hope I'll still manage.

thanks,
alfonso

  • Comment on Dump all my gmail emails to parse them without storing them

Replies are listed 'Best First'.
Re: Dump all my gmail emails to parse them without storing them
by Corion (Patriarch) on Dec 01, 2014 at 12:37 UTC

    The easiest approach in my opinion would be to just access Gmail via the IMAP interface and then use (say) Net::IMAP::Client (or Mail::IMAPClient) to extract the information you want. With Net::IMAP::Client, you can then, fetch the relevant information via IMAP:

    # fetch message summaries (actually, a lot more) my $summaries = $imap->get_summaries([ @msg_ids ]); foreach (@$summaries) { print $_->uid, $_->subject, $_->date, $_->rfc822_size; print join(', ', @{$_->from}); # etc. }

    I'm not sure where exactly your problem lies, as you haven't told us and haven't shown any code you've already written. I think that this would be a good next step.