Seems the attachment stripper either doesn't work, or I have it coded incorrectly. Am currently running a script without the attachment stripper. It has been running for an hour or so, and those messages appear all the time - "Complex regular subexpression recursion limit (32766) exceeded at /usr/share/perl5/Email/Address.pm line 108." This most likely has something to do with a regex on the attachments. Hence the need to process email files without attachments.

Was looking through some small scripts I have here that look only for "From:", "To:", "Cc" and "Bcc". That led to using Email::Simple ..

#!/usr/bin/env perl # use strict; use warnings; use File::Find; use Email::Simple; use File::Slurp qw( read_file ); my $directory = '/home/******/Mail/.family.directory/Browne, Bill & Ma +rtha'; my $outfile = 'output2.txt'; my @found_files; find( sub { push @found_files, $File::Find::name }, $directory ); foreach(@found_files) { my $file = "$_"; if (-f $file) { print $_,"\n"; my $intext = File::Slurp::read_file( $file ); my $mail = Email::Simple->new($intext); my $from_header = $mail->header("From"); my $to_header = $mail->header("To"); my $date_header = $mail->header("Date"); my $cc_header = $mail->header("CC"); my $bcc_header = $mail->header("BCC"); my @emails = ""; push @emails, ($from_header, $to_header); if( length $cc_header ) { push @emails, $cc_header; } if( length $bcc_header ) { push @emails, $bcc_header; } File::Slurp::write_file( $outfile, {append => 1 }, join("\n", @ema +ils ) ); } }

This took about 2 seconds to process all the 592 emails, and successfully output the names and emails to a file. Just a few observations:

My use of an array needs improving

I'm unsure if the $mail->header("CC"); will also read line/s with "Cc" or "cc". The same is true for BCC.

Where there are a lot of emails, I need to format them so that every "," is replaced so that it becomes a seperate entry in the array. At present it is one large string with email names/address, seperated with a comma. (Will need to be careful where a "," is in the name though). How do I do that ?


In reply to Re^7: extracting name & email address by peterr
in thread extracting name & email address by peterr

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.