eLore has asked for the wisdom of the Perl Monks concerning the following question:

Fellow monks: the situation came up that i needed to write an interface that slurps HTML formated IMAP4 emails and spits out syslog entries. I decided to use the modules included below. It seems to work great, as long as there's only one message in the INBOX. Any more than that, and each log entry includes all the previous. Anything in my code below not re-initializing properly, or is Email::Simple returning the current message body in addition to all the previous? I've included some of my own debugging lines. (FYI it's been about 3 years since the last time i wrote any code)

#!/usr/local/bin/perl use Net::IMAP::Simple; use Email::Simple; use Data::Dumper; use HTML::Stripper; use Sys::Syslog; my $user = 'someuser'; my $password = 'somepass'; my $mailhost = 'somehost'; my $server = Net::IMAP::Simple->new($mailhost) || die "No server\n"; $server->login($user, $password) || die "Access denied\n"; my $number_of_messages = $server->select('INBOX'); foreach my $msg ( 1 .. $number_of_messages ) { #UPDATED STRIPPER INITIALIZATION LOCATION my $stripper = HTML::Stripper->new( skip_cdata => 1, strip_ws =>0 );</B> my $email = Email::Simple->new( join '',@{$server->get( $msg )} ); my $newline = ""; $_ = $email->header('Subject'); if(/^Site ID\:/){ ### IF IT'S A "Site ID:" MSG... my $html_message = $email->body; my $txt_message = $stripper->strip_html($html_message); my @email_array = split /\cM/, $txt_message; foreach my $line (@email_array){ $line =~ s/\n//g; if(($var, $value) = $line =~ /^\s*(.+\S+\s*): (.*)/){ $var =~ s/\s/\_/g; ## convert spaces in key to unders +core $value =~ s/\s*$//g; ## zap trailing spaces from valu +e $newline = $newline.$var."=\"".$value."\" "; $var, $value = ""; } ### end search for var, value $line = ""; $_ = $newline; } ### end processing of each line my $line_length = length $newline; # print "LINE: $newline\n"; # print "$line_length"; $newline =~ /.{,923}/; ## syslog can only handle 1024 characte +rs, so chop some $newline =~ s/\s*$//g; ## zap trailing spaces # my $new_length = length $newline; # print ":$new_length"; $_ = $newline; if(!$newline =~ /\"$/){ ## If the trailing character is NOT a + " $newline = $newline."\""; } ### END MSG LENGTH CHECK #openlog($0,'nowait,pid',local7); #syslog('local7|notice',, $newline); #syslog('local7|notice',,'flush'); #closelog(); print "LINE:$newline\n-----------------\n"; $newline = ""; # $last_newline = length $newline; if($server->copy($msg, 'Processed')){ $server->delete($msg); } } ### END IF SITE # sleep 1; } ### END FOREACH MSG $server->quit();

Thanks!

Replies are listed 'Best First'.
Re: Email::Simple doing something i'm not expecting??
by benizi (Hermit) on Sep 04, 2004 at 19:15 UTC

    The problem is with HTML::Stripper. It's designed to accept multiple chunks of HTML to produce its final output. (The only hint in the docs to this effect is [emphasis mine]: "Now that we have our stripper object (wow!), we use it on one or more chunks of HTML using the strip_html() method").

    If you move the creation of the HTML::Stripper object into your $msg loop, it should work as intended.

      benizi was correct. I moved the stripper initialization inside the message loop and it now works as i had expected initially.

      Many thanks!!

      -eLore