stillcool has asked for the wisdom of the Perl Monks concerning the following question:

currently I am doin anti spam with perl....

i wan to remove header of the email (title, from,Delivered-To,Received etc) to increase my accuracy...

how can i do it?? can anyone help me??

i wan get the contain starting from "Martin A posted:" the bold words...till the rest of the mail so the header word will not include in total word that contain in the mail...

below is some of my code to get full word count(698 words total) of the mail.

i copy and paste the mail into textfile and use textfile to handle it

--------------------------------------------------------------------------------
---------------------------------------------------------------------- +---------- #!/usr/local/bin/perl use strict; use warnings; my $count=0; open(FILE, "C:/Perl/testfile.txt"); while(<FILE>) #count the total words { $count++ while m/[a-zA-Z]\w*/g;} print "total word = $count \n"; ---------------------------------------------------------------------- +----------

this is the sample mail

From Steve_Burt@cursor-system.com Thu Aug 22 12:46:39 2002 Return-Path: <Steve_Burt@cursor-system.com> Delivered-To: zzzz@localhost.netnoteinc.com Received: from localhost (localhost [127.0.0.1]) by phobos.labs.netnoteinc.com (Postfix) with ESMTP id BE12E43C34 for <zzzz@localhost>; Thu, 22 Aug 2002 07:46:38 -0400 (EDT) Received: from phobos [127.0.0.1] by localhost with IMAP (fetchmail-5.9.0) for zzzz@localhost (single-drop); Thu, 22 Aug 2002 12:46:38 +0100 (IST +) Received: from n20.grp.scd.yahoo.com (n20.grp.scd.yahoo.com [66.218.66.76]) by dogma.slashnull.org (8.11.6/8.11.6) with SMTP id g7MBkTZ05087 for <zzzz@example.com>; Thu, 22 Aug 2002 12:46:29 +0100 X-Egroups-Return: sentto-2242572-52726-1030016790-zzzz=example.com@ret +urns.groups.yahoo.com Received: from [66.218.67.196] by n20.grp.scd.yahoo.com with NNFMP; 22 Aug 2002 11:46:30 -0000 X-Sender: steve.burt@cursor-system.com X-Apparently-To: zzzzteana@yahoogroups.com Received: (EGP: mail-8_1_0_1); 22 Aug 2002 11:46:29 -0000 Received: (qmail 11764 invoked from network); 22 Aug 2002 11:46:29 -00 +00 Received: from unknown (66.218.66.217) by m3.grp.scd.yahoo.com with QM +QP; 22 Aug 2002 11:46:29 -0000 Received: from unknown (HELO mailgateway.cursor-system.com) (62.189.7. +27) by mta2.grp.scd.yahoo.com with SMTP; 22 Aug 2002 11:46:29 -0000 Received: from exchange1.cps.local (unverified) by mailgateway.cursor-system.com (Content Technologies SMTPRS 4.2.10) wit +h ESMTP id <T5cde81f695ac1d100407d@mailgateway.cursor-system.com> for <forteana@yahoogroups.com>; Thu, 22 Aug 2002 13:14:10 +0100 Received: by exchange1.cps.local with Internet Mail Service (5.5.2653. +19) id <PXX6AT23>; Thu, 22 Aug 2002 12:46:27 +0100 Message-Id: <5EC2AD6D2314D14FB64BDA287D25D9EF12B4F6@exchange1.cps.loca +l> To: "'zzzzteana@yahoogroups.com'" <zzzzteana@yahoogroups.com> X-Mailer: Internet Mail Service (5.5.2653.19) X-Egroups-From: Steve Burt <steve.burt@cursor-system.com> From: Steve Burt <Steve_Burt@cursor-system.com> X-Yahoo-Profile: pyruse MIME-Version: 1.0 Mailing-List: list zzzzteana@yahoogroups.com; contact forteana-owner@yahoogroups.com Delivered-To: mailing list zzzzteana@yahoogroups.com Precedence: bulk List-Unsubscribe: <mailto:zzzzteana-unsubscribe@yahoogroups.com> Date: Thu, 22 Aug 2002 12:46:18 +0100 Subject: [zzzzteana] RE: Alexander Reply-To: zzzzteana@yahoogroups.com Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Martin A posted: Tassos Papadopoulos, the Greek sculptor behind the plan, judged that t +he limestone of Mount Kerdylio, 70 miles east of Salonika and not far fro +m the Mount Athos monastic community, was ideal for the patriotic sculpture. + As well as Alexander's granite features, 240 ft high and 170 ft wide, +a museum, a restored amphitheatre and car park for admiring crowds are planned --------------------- So is this mountain limestone or granite? If it's limestone, it'll weather pretty fast. ------------------------ Yahoo! Groups Sponsor ---------------------~- +-> 4 DVDs Free +s&p Join Now http://us.click.yahoo.com/pt6YBB/NXiEAA/mG3HAA/7gSolB/TM ---------------------------------------------------------------------~ +-> To unsubscribe from this group, send an email to: forteana-unsubscribe@egroups.com Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/ter +ms/

Replies are listed 'Best First'.
Re: perl remove mail(textfile) header(eg. title, from,Delivered-To,Received)
by talexb (Chancellor) on Apr 26, 2009 at 16:00 UTC

    I would suggest that Mail::Internet would probably be a good module to use. It looks like it can give you just the body of the message.

    And by the way, if you could wrap your example data (in this case, an E-Mail message, where line endings are relevant) inside code tags, it would be much appreciated. Thanks.

    Alex / talexb / Toronto

    "Groklaw is the open-source mentality applied to legal research" ~ Linus Torvalds

      tq now my query look more in arrange

      jus now i don noe how to put the mail example properly

      i am newbie in perlmonks

      i jus wan get the content of the mail start from

      Martin A posted:

      and calculate the total word that appears in the content...

        Having just now fiddled with Mail::Internet and failed to get it working, let me instead provide the following:

        #!/usr/bin/perl -w # # while(<>) { last if ( /^\s*$/ ); } print <>;

        Running this using the message as the input file will produce the output

        Martin A posted: Tassos Papadopoulos, the Greek sculptor behind the plan, judged that t +he limestone of Mount Kerdylio, 70 miles east of Salonika and not far fro +m the Mount Athos monastic community, was ideal for the patriotic sculpture. + As well as Alexander's granite features, 240 ft high and 170 ft wide, +a museum, a restored amphitheatre and car park for admiring crowds are planned --------------------- So is this mountain limestone or granite? If it's limestone, it'll weather pretty fast. ------------------------ Yahoo! Groups Sponsor ---------------------~- +-> 4 DVDs Free +s&p Join Now http://us.click.yahoo.com/pt6YBB/NXiEAA/mG3HAA/7gSolB/TM ---------------------------------------------------------------------~ +-> To unsubscribe from this group, send an email to: forteana-unsubscribe@egroups.com Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/ter +ms/

        as requested.

        Since the divider between the mail header and the mail body is just a blank line, this script should be dead easy for you. I highly recommend you get a good book that explains E-Mail in some detail, perhaps something from O'Reilly.

        A good book on Perl would also be an excellent investment.

        Alex / talexb / Toronto

        "Groklaw is the open-source mentality applied to legal research" ~ Linus Torvalds

Re: perl remove mail(textfile) header(eg. title, from,Delivered-To,Received)
by generator (Pilgrim) on Apr 26, 2009 at 19:35 UTC
    Welcome to PerlMonks! Consider the use of a module for the processing of e-mail. If you are unfamiliar with the concept of modules take a look at http://www.cpan.org/

    Also, take a look around PerlMonks. Upon seeing your question I browsed to Code Catacombs where I noticed that one of the monks neilwatson had posted code which splits email messages into their component parts. His code uses several of the MIME modules.

    I'm pretty sure that your use of Super Search would turn up code to facilitate word counts.

    I'm far from an accomplished Perl coder, but in the few months I've been here, I've learned that being an expert at exploring PerlMonks, can sometimes substitute for being a Perl expert!

    <><

    generator