comment on

Seems the attachment stripper either doesn't work, or I have it coded incorrectly. Am currently running a script without the attachment stripper. It has been running for an hour or so, and those messages appear all the time - "Complex regular subexpression recursion limit (32766) exceeded at /usr/share/perl5/Email/Address.pm line 108." This most likely has something to do with a regex on the attachments. Hence the need to process email files without attachments.

Was looking through some small scripts I have here that look only for "From:", "To:", "Cc" and "Bcc". That led to using Email::Simple ..

#!/usr/bin/env perl
#
use strict;
use warnings;
use File::Find;
use Email::Simple;
use File::Slurp qw( read_file );

my $directory = '/home/******/Mail/.family.directory/Browne, Bill & Ma
+rtha';
my $outfile = 'output2.txt';

my @found_files;
find( sub { push @found_files, $File::Find::name }, $directory );

foreach(@found_files) {
  my $file = "$_";
      
  if (-f $file) {
    print $_,"\n";
    my $intext = File::Slurp::read_file( $file );
    
    my $mail            = Email::Simple->new($intext);
    my $from_header     = $mail->header("From");
    my $to_header       = $mail->header("To");
    my $date_header     = $mail->header("Date");
    my $cc_header       = $mail->header("CC");
    my $bcc_header      = $mail->header("BCC");

    my @emails = "";
    push @emails, ($from_header, $to_header);

    if( length $cc_header ) {
      push @emails, $cc_header;
    }
    if( length $bcc_header ) {
      push @emails, $bcc_header;
    }
  
    File::Slurp::write_file( $outfile, {append => 1 }, join("\n", @ema
+ils ) );

  }
}
[download]

This took about 2 seconds to process all the 592 emails, and successfully output the names and emails to a file. Just a few observations:

My use of an array needs improving

I'm unsure if the $mail->header("CC"); will also read line/s with "Cc" or "cc". The same is true for BCC.

Where there are a lot of emails, I need to format them so that every "," is replaced so that it becomes a seperate entry in the array. At present it is one large string with email names/address, seperated with a comma. (Will need to be careful where a "," is in the name though). How do I do that ?

In reply to Re^7: extracting name & email address by peterr
in thread extracting name & email address by peterr

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.