Re^3: extracting name & email address

The following code works to a point. It is writing name and email address out to a file. The only real problem is the msg "Complex regular subexpression recursion limit (32766) exceeded at /usr/share/perl5/Email/Address.pm line 108."

#!/usr/bin/env perl
#
use strict;
use warnings;
use File::Find;
use File::Slurp qw( read_file );
use Email::Address;

my $directory = '/home/*****/Mail/.family.directory/Browne, Bill & Ma+
+rtha';';
my $outfile = 'output.txt';

my @found_files;
find( sub { push @found_files, $File::Find::name }, $directory );

foreach(@found_files){
    my $file = "$_";
  
    if (-f $file)
    {
    print $_,"\n";
    my $intext = File::Slurp::read_file( $file );
    my @emails = Email::Address->parse( $intext );
    
    File::Slurp::write_file( $outfile, {append => 1 }, join("\n", @ema
+ils) );
    }
}
[download]

The file that the warning msg appears has a large attachment. So, somehow need to bypass any attachments in the slurp ?

Comment on Re^3: extracting name & email address Download Code

Replies are listed 'Best First'.
Re^4: extracting name & email address by peterr (Scribe) on Feb 24, 2015 at 03:24 UTC
Added an attachment 'stripper' module, but now there is no output .. #!/usr/bin/env perl # use strict; use warnings; use File::Find; use File::Slurp qw( read_file ); use Email::Address; use Email::MIME::Attachment::Stripper; my $directory = '/home/*****/Mail/.family.directory/Browne, Bill & Mar +tha';';'; my $outfile = 'output.txt'; my @found_files; find( sub { push @found_files, $File::Find::name }, $directory ); foreach(@found_files){ my $file = "$_"; if (-f $file) { print $_,"\n"; my $intext = File::Slurp::read_file( $file ); my $stripper = Email::MIME::Attachment::Stripper->new($intext); my @emails = Email::Address->parse( $stripper ); File::Slurp::write_file( $outfile, {append => 1 }, join("\n", @e +mails) ); } } [download] One thing though, it processed all the files very quickly. LOL	[reply] [d/l]
Re^5: extracting name & email address by Anonymous Monk on Feb 24, 2015 at 03:33 UTC
Remember Re^3: extracting name & email address? The idea of writing code that way, is that you can do `StripperMeAddys( 'oneGoodTestFile.mime' ) ;` until you get StripperMeAddys working the way it should Then you can focus on things like typos, copy/pasting from the docs ... things that are likely to work :) like `my $msg = $stripper->message;` [download]	[reply] [d/l] [select]
Re^6: extracting name & email address by peterr (Scribe) on Feb 24, 2015 at 04:43 UTC
..yes, I prefer the structure of the sub routines (having first discovered 'structured cobol' years ago, lol) I will work on a small sub as you suggested, to see what the attachment strip does for an email with numerous attachments.	[reply]
Re^5: extracting name & email address by peterr (Scribe) on Feb 24, 2015 at 23:42 UTC
Have spent quite a bit of time trying to get the attachment stripper working. Here is the latest (test) code .. #!/usr/bin/env perl # use strict; use warnings; use File::Slurp qw( read_file ); use Email::MIME::Attachment::Stripper; use Data::Dumper; my $path = '/home/***/Mail/.family.directory/Browne, Bill & Martha'; my $outfile = 'output.txt'; Main( @ARGV ); exit( 0 ); sub Main { my @files = RecursivePathSearch( $path ); #for my $file ( @files ){ #SomethingHere( $file ); my $test_file = '/home/****/Mail/.family.directory/12809159 +07.6583.I9x0z:2,S'; StripperMeAddys( $test_file ); #} } sub RecursivePathSearch { my( $path ) = @_; use File::Find::Rule qw/ find rule/; return rule( file => not_name => [ '.pl', ], )->in( $path ); } sub SomethingHere { my( $file ) = @_; use Path::Tiny qw/ path /; use Email::Address; my $stuff = path( $file )->slurp_raw; return Email::Address->parse( $stuff ); } sub StripperMeAddys { my( $test_file ) = @_; my $intext = File::Slurp::read_file( $test_file ); my $parsed = Email::MIME->new($intext); #print "Parsed content :\n". Dumper( $parsed) . "\n"; my $parts = $parsed->parts; print "Number of email parts : $parts\n"; my @parts = $parsed->parts; my $stripper; if ($parts > 1) { $stripper = Email::MIME::Attachment::Stripper->new($parts[1]); } else { $stripper = $parsed; } print "Stripper content :\n". Dumper( $stripper) . "\n"; my @emails = Email::Address->parse( $stripper ); File::Slurp::write_file( $outfile, {append => 1 }, join("\n", @ema +ils) ); return; } [download] There is nothing being put out to the outfile.txt I get a message that there are 3 parts in the test file, which is correct. I want to process the first part only. When searching for other examples of people using this attachment stripper, most of the posts had problems with it. Does it really work ? Is there an alternate code to bypass attachments ?	[reply] [d/l]
Re^6: extracting name & email address by peterr (Scribe) on Feb 25, 2015 at 00:07 UTC
Found some code that strips out the attachments and writes them as files. I tried it on 2 test files, one with 3 parts, another with about 15 parts. In both cases the script wrote out the attachments correctly. Here is that code with a 'print' added .. use Email::MIME; use Email::MIME::Attachment::Stripper; use File::Slurp qw(slurp write_file); my $infile = '/home/******/Mail/.family.directory/1280915907.6583.I9x +0z:2,S'; my $m = Email::MIME->new( scalar slurp $infile ); my $s = Email::MIME::Attachment::Stripper->new( $m, 'force_filename' = +> 1 ); print $s->message; #displays "Email::MIME=HASH(0xe0fca8)" foreach my $attachment ( $s->attachments ) { write_file( $attachment->{filename}, { buf_ref => \$attachment->{payload} } ) or die "Can't write $attachment->{filename}: $!\n"; } [download] It displays - Email::MIME=HASH(0xe0fca8) How do I get the contents of part one into a variable so that I can them extract the email addresses please ?	[reply] [d/l]
Re^7: extracting name & email address by peterr (Scribe) on Feb 25, 2015 at 02:34 UTC
Re^8: extracting name & email address by peterr (Scribe) on Mar 04, 2015 at 00:21 UTC
Some notes below your chosen depth have not been shown here