Re: Extract IP from email dataset?

If you are using SpamAssassin then the below examples should help you:

#
# MultiReceived
# Written By: Roy Elton Crowder, III (roy.crowder@gmail.com)
# Date: 12 March 2009
# Company: WorldSpice Technologies
#                   5050 Poplar Avenue, Suite 170
#                   Memphis, TN 38111
#                   Tollfree Number: (866) 466-7733
# Description:
#   This script was written to count the number of "Received: from" in
+ a
#   header. We were getting emails that were bouncing from multiple em
+ail
#   servers before they came to us. Spammers found that they could get
#   through this way. If more than one "Received: from" are found then
#   the script returns 1 (true) which tells SpamAssassin to assign poi
+nts
#   to the email in question.
#
# If you have any questions please feel free to email me at the email 
+above.
#


package MultiReceived;
1;

use strict;

# Module imports
use Mail::SpamAssassin;
use Mail::SpamAssassin::Plugin;

# Inheritance
our @ISA = qw(Mail::SpamAssassin::Plugin);

# Subroutine new
sub new {
   my ($class, $mailsa) = @_;

   # Create the object
    $class = ref($class) || $class;
   my $self = $class->SUPER::new( $mailsa );
   bless ($self, $class);

   # Register the object's subroutine with SpamAssassin as a Plugin
    $self->register_eval_rule ( 'check_for_multiple_received' );

   return $self;
}


#
# check_for_multiple_received
# Parameters:
# $self
# $msg
#
sub check_for_multiple_received {
   # $msg is an object from Mail::SpamAssassin::PerMsgStatus
    my ($self, $msg) = @_;

   # Get the entire header.
    my $header = $msg->get( 'ALL' );

   # Split the header on new lines.
    my @h = split(/\n/, $header);

   # Counting Variable
    my $num_received = 0;

   # Count the number of "Received: from" there are.
    foreach (@h) {
       # Regex to match against each line of the
        # header. If a "Received: from" is found,
        # add 1 to the count.
        if ($_ =~ /\s*Received:\s* from/) {
           $num_received = $num_received + 1;
       }

       # If more than 1 "Received: from" is found,
        # do not continue, return 1 (true) to assign
        # points.
        if ($num_received > 1) {
           return 1;
       }
   }

   # If we made it this far, the email was good
    return 0;
}

#
# MultiNewLine
# Written By: Roy Elton Crowder, III (roy@worldspice.net)
# Date Written: 24 March 2009
# Company: WorldSpice Technologies
#                   5050 Poplar Avenue, Suite 170
#                   Memphis, TN 38111
#                   Tollfree Number: (866) 466-7733
#
# Description:
#   This SpamAssassin plugin is written to handle emails that have a c
+ontinuous
#   set of \n (newline) characters. We have set this plugin to catch a
+ny email
#   that has 10 or more continuous \n characters. It is pretty common 
+to see 2-3
#   \n characters towards the end of an email for signature purposes b
+ut anything
#   beyond 10 is considered spam.
#

package MultiNewLine;
1;

use strict;

use Mail::SpamAssassin;
use Mail::SpamAssassin::Message;
use Mail::SpamAssassin::Plugin;
our @ISA = qw(Mail::SpamAssassin::Plugin);

# new is used to instantiate a new SpamAssassin plugin
sub new {
   my ($class, $mailsa) = @_;
   $class = ref($class) || $class;
   my $self = $class->SUPER::new( $mailsa );
   bless ($self, $class);
   $self->register_eval_rule ( 'check_for_multiple_newline' );

   return $self;
}

#
# check_for_multiple_newline
# Parameters:
# $self
# $msg
#
sub check_for_multiple_newline {
   my ($self, $msg) = @_;

   # The $msg variable is parameterized as a PerMsgStatus object.
    # To get the body of the email we must first get a
    # Mail::SpamAssassin::Message object. This is done by using the
    # get_message() subroutine defined under the PerMsgStatus object.
    my $message = $msg->get_message();

   # Now that we have an actual message object, we can get the body.
    my $body = $message->get_body();

   # Variables
    my $nl_count = 0;
   my $found = 0;

   # We now can parse through the body, line by line, and count the
    # number of \n characters. We want a continuous set of \n characte
+rs
    # thus you see some additional checks within the loop.
    foreach (@$body) {
       # We mark $found as true when a \n character is found one a lin
+e
        # by itself. If we have already found a \n on a line by itself
+ then
        # each subsequent \n character we find on a line by itself wil
+l up
        # the count by one. If we come across a line that has somethin
+g other
        # than a \n after a \n has been found on a line by itself then
+ we set
        # the count back to zero and found is false.
        if ($found) {
           if ($_ =~ /^\n$/) {
               $nl_count = $nl_count + 1;
           } else {
               $nl_count = 0;
               $found = 0;
           }
      } else {
           if ($_ =~ /^\n$/) {
               $nl_count = 1;
               $found = 1;

           }

      }

       # If our count is greater than or equal to 10, add points. Othe
+rwise,
        # continue parsing the message.
        if ($nl_count >= 10) {
           return 1;
       }
   }

 # If we have gotten this far, the message is legit as
  # far as this module is concerned, don't add points.
  return 0;
}
[download]

You can find complete explanations on these here.

The Web is like a dominatrix. Everywhere I turn, I see little buttons ordering me to Submit. (Nytwind)

Comment on Re: Extract IP from email dataset? Download Code

Replies are listed 'Best First'.
Re^2: Extract IP from email dataset? by moritz (Cardinal) on May 12, 2009 at 15:14 UTC
`package MultiNewLine; 1;` [download] Just a tiny piece of nit-picking: this is cargo cult programming, the `1;` has absolutely no effect. The return value of a module must be a true value, but the return value is that of the last statement which doesn't declare a sub, which in your case is `our @ISA = qw(Mail::SpamAssassin::Plugin);`	[reply] [d/l] [select]
Re^3: Extract IP from email dataset? by RoyCrowder (Monk) on May 12, 2009 at 16:22 UTC
Thanks moritz. I'll make note of that and change them. The Web is like a dominatrix. Everywhere I turn, I see little buttons ordering me to Submit. (Nytwind)	[reply]
Re^3: Extract IP from email dataset? by John M. Dlugosz (Monsignor) on May 12, 2009 at 18:25 UTC
I use `return 1; # module loaded OK` [download] to be clear. Oh, and I put that at the end of the .pm file.	[reply] [d/l]
Re^2: Extract IP from email dataset? by bharadwajh (Initiate) on May 12, 2009 at 15:30 UTC
The email format Message-Id: <200301112242.h0BMg4000951@ns.eisnoc1.net> Date: Sat, 11 Jan 2003 15:40:09 -0500 From: "Ronald George" <ron@247customer.com> To: "Omgkitty User" <nospam@omgkitty.com> Subject: Whats up? MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit	[reply]
Re^3: Extract IP from email dataset? by CountZero (Bishop) on May 12, 2009 at 18:36 UTC
use strict; use Regexp::Common qw /net/; my $email = q\|Received: (qmail 23153 invoked from network); 14 Apr 200 +4 21:19:04 -0000 Message-Id: <200301112242.h0BMg4000951@ns.eisnoc1. +net> Date: Sat, 11 Jan 2003 15:40:09 -0500 From: "Ronald George" <ron +@247customer.com> To: "Omgkitty User" <nospam@omgkitty.com> Subject: +Whats up? MIME-Version: 1.0 Content-Type: text/plain; charset="iso-88 +59-1" Content-Transfer-Encoding: 7bit	[reply] [d/l] [select]
Re^3: Extract IP from email dataset? by tweetiepooh (Hermit) on May 12, 2009 at 15:57 UTC
If you have this "limited" set then you can simplify your regex to extract something that looks like an IP address though it won't validate it in any form. Just think about what you are looking for and work from there. How do you as a human pick out IP addresses? Now how can you define that in a regex? Don't worry if it's a valid IP since you maybe able to assume that the source already has done some of that for you. If you do want to make sure you only get valid IP then as already suggested, check CPAN. (Or work it out for yourself as a nice exercise.)	[reply]
Re^3: Extract IP from email dataset? by John M. Dlugosz (Monsignor) on May 12, 2009 at 18:28 UTC
Please edit your post and put in <code> tags. You know the "preview" you get before committing your post? That's how it's going to look to everyone else!	[reply]