Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi all! I am really new to perl. After a lot of searching, borrowing code and getting a qwerty tattoo on my forehead, I could use a hand. Essentially all I am trying to do is create an email filter. I want to search for certain terms. If the term is in the body of the email then save the to, from, date/time and body of the emails that match to a text file. Right now I cannot seem to get the string comparison part correct. The issue is described in the beginning code comment.

# Status - Program flow seems to be correct. Program IO seems correct +. Program connectivity seems correct. # Issue with string evaluation. Program will not return a match if the +re if anything MORE than a correct match... # i.e. - if $SearchTerms = "blue monkies" and the only thing in the bo +dy of the email is "blue monkies" it returns # a match. But if the body of the email reads "maroon baboons are not + blue monkies" no match is returned. # 1/17/2012 - JLC # MailCrawl uses 3 arguments: -h <host> -u <username> -p <password> # There is a hard coded path to Search_Terms.txt assigned to $InFile. + This is the file containing search terms. # Needs to be a standard text file with one search term per line. # Emails matching the search criteria are saved in a raw text file Ema +il.<Date>.txt. Path is hard coded to $OutFile #!/bin/perl # import packages use Net::POP3; use Getopt::Long; use strict; use warnings; # Declare variables my $Sec; my $Min; my $Hour; my $MDay; my $Mon; my $Year; my $WDay; my $YDay; my $IsDst; my $Today; my $InFile; my $OutFile; my @SearchTerms; my $Host; my $User; my $Pass; my $NumMsg; my $MsgCount; my $SearchTerms; my $Msg; my $MsgList; my $TermNum; my $TermCount; my $Ref; # Get todays date to be used in output file name ($Sec,$Min,$Hour,$MDay,$Mon,$Year,$WDay,$YDay,$IsDst) = localtime(tim +e); $Year += 1900; $Mon++; $Today = "$Mon.$MDay.$Year"; # Set file paths $InFile = 'H:\Tools\MailCrawl\Search_Terms.txt'; $OutFile = 'H:\Tools\MailCrawl\Email_Files\Email.'. $Today . '.txt'; # Load search terms from file into @SearchTerms array open (SearchTerms, "<$InFile") or die "(ERROR: Unable to open Search_T +erms).\n"; @SearchTerms = <SearchTerms>; close (SearchTerms); # read command line options # display usage message in case of error GetOptions ('h|host=s' => \$Host, 'u|user=s' => \$User, 'p|pass=s' => \$Pass) or die("Input error. Try calling me +with: -h <host> -u <username> -p <password>"); # initiate connection # default timeout = 120 sec my $conn = Net::POP3->new($Host) or die("ERROR: Unable to connect.\n") +; # login $NumMsg = $conn->login($User, $Pass) or die("ERROR: Unable to login.\n +"); # $MsgCount used to show current message being evaluated $MsgCount=0; # get message numbers # iterate over list and print each if ($NumMsg > 0) { $MsgList = $conn->list(); # Loop for each email msg foreach $Msg (keys(%$MsgList)) { # Set variables to be displayed to show search status $TermNum = @SearchTerms; $TermCount = 0; $MsgCount++; $Ref = $conn->get($Msg); # Loop for each search term for $SearchTerms (@SearchTerms){ $TermCount++; print "searching msg $MsgCount of $NumMsg for $Sea +rchTerms...\n"; # If match was found, write email to file if (@$Ref ~~ /$SearchTerms/i) { open (EmailFile, ">>$OutFile"); print EmailFile "@$Ref \n"; close (EmailFile); } } } } else { print "Mailbox is empty.\n"; } # close connection $conn->quit();

Replies are listed 'Best First'.
Re: String Comparison Issue
by moritz (Cardinal) on Jan 17, 2012 at 06:43 UTC

    It seems that the strings in @searchTerms still have trailing newline characters, so if you search for them, you'll only find them at the end of the line.

    chomp can help you here.

Re: String Comparison Issue
by About_a_perl (Novice) on Jan 17, 2012 at 06:44 UTC

    This is actually my post. Didn't have an account at the time. Hopefully after commenting, I will receive email notifications if anyone else comments. Thanks for any light that can be shed!

      No e-mail notification is provided. You get a personal message here on PerlMonks, though.
        But only if someone directly respons to one of your posts. The OP will not get a message if someone replies to thread starter, nor will he be notified about this post (it being a reply to a reply). This isn't Facebook or Google+, nor Usenet.
Re: String Comparison Issue
by About_a_perl (Novice) on Jan 17, 2012 at 07:01 UTC

    Thank you moritz! That was spot on and the script is running as expected. Another question I have... Is there a way to only save a few sections of an email to file, or is it all or nothing? I would like only the following field: from, date/time, subject and the body.

      So what parts do you want stripped? That list looks to constitute close to a full email, apart from a number of rather irrelevant headers — because headers are relatively small.

      Unless you want to strip out attachments. In that case, you'll most likely will have to parse the mails as MIME mail.

      Then, instead of printing the whole email body to the result file, extract the wanted headers and print those to the output.

Re: String Comparison Issue
by About_a_perl (Novice) on Jan 17, 2012 at 22:19 UTC
    Basically its just the headers I do not want to be saved.