in reply to Extract IP from email dataset?

If you are using SpamAssassin then the below examples should help you:
# # MultiReceived # Written By: Roy Elton Crowder, III (roy.crowder@gmail.com) # Date: 12 March 2009 # Company: WorldSpice Technologies # 5050 Poplar Avenue, Suite 170 # Memphis, TN 38111 # Tollfree Number: (866) 466-7733 # Description: # This script was written to count the number of "Received: from" in + a # header. We were getting emails that were bouncing from multiple em +ail # servers before they came to us. Spammers found that they could get # through this way. If more than one "Received: from" are found then # the script returns 1 (true) which tells SpamAssassin to assign poi +nts # to the email in question. # # If you have any questions please feel free to email me at the email +above. # package MultiReceived; 1; use strict; # Module imports use Mail::SpamAssassin; use Mail::SpamAssassin::Plugin; # Inheritance our @ISA = qw(Mail::SpamAssassin::Plugin); # Subroutine new sub new { my ($class, $mailsa) = @_; # Create the object $class = ref($class) || $class; my $self = $class->SUPER::new( $mailsa ); bless ($self, $class); # Register the object's subroutine with SpamAssassin as a Plugin $self->register_eval_rule ( 'check_for_multiple_received' ); return $self; } # # check_for_multiple_received # Parameters: # $self # $msg # sub check_for_multiple_received { # $msg is an object from Mail::SpamAssassin::PerMsgStatus my ($self, $msg) = @_; # Get the entire header. my $header = $msg->get( 'ALL' ); # Split the header on new lines. my @h = split(/\n/, $header); # Counting Variable my $num_received = 0; # Count the number of "Received: from" there are. foreach (@h) { # Regex to match against each line of the # header. If a "Received: from" is found, # add 1 to the count. if ($_ =~ /\s*Received:\s* from/) { $num_received = $num_received + 1; } # If more than 1 "Received: from" is found, # do not continue, return 1 (true) to assign # points. if ($num_received > 1) { return 1; } } # If we made it this far, the email was good return 0; } # # MultiNewLine # Written By: Roy Elton Crowder, III (roy@worldspice.net) # Date Written: 24 March 2009 # Company: WorldSpice Technologies # 5050 Poplar Avenue, Suite 170 # Memphis, TN 38111 # Tollfree Number: (866) 466-7733 # # Description: # This SpamAssassin plugin is written to handle emails that have a c +ontinuous # set of \n (newline) characters. We have set this plugin to catch a +ny email # that has 10 or more continuous \n characters. It is pretty common +to see 2-3 # \n characters towards the end of an email for signature purposes b +ut anything # beyond 10 is considered spam. # package MultiNewLine; 1; use strict; use Mail::SpamAssassin; use Mail::SpamAssassin::Message; use Mail::SpamAssassin::Plugin; our @ISA = qw(Mail::SpamAssassin::Plugin); # new is used to instantiate a new SpamAssassin plugin sub new { my ($class, $mailsa) = @_; $class = ref($class) || $class; my $self = $class->SUPER::new( $mailsa ); bless ($self, $class); $self->register_eval_rule ( 'check_for_multiple_newline' ); return $self; } # # check_for_multiple_newline # Parameters: # $self # $msg # sub check_for_multiple_newline { my ($self, $msg) = @_; # The $msg variable is parameterized as a PerMsgStatus object. # To get the body of the email we must first get a # Mail::SpamAssassin::Message object. This is done by using the # get_message() subroutine defined under the PerMsgStatus object. my $message = $msg->get_message(); # Now that we have an actual message object, we can get the body. my $body = $message->get_body(); # Variables my $nl_count = 0; my $found = 0; # We now can parse through the body, line by line, and count the # number of \n characters. We want a continuous set of \n characte +rs # thus you see some additional checks within the loop. foreach (@$body) { # We mark $found as true when a \n character is found one a lin +e # by itself. If we have already found a \n on a line by itself + then # each subsequent \n character we find on a line by itself wil +l up # the count by one. If we come across a line that has somethin +g other # than a \n after a \n has been found on a line by itself then + we set # the count back to zero and found is false. if ($found) { if ($_ =~ /^\n$/) { $nl_count = $nl_count + 1; } else { $nl_count = 0; $found = 0; } } else { if ($_ =~ /^\n$/) { $nl_count = 1; $found = 1; } } # If our count is greater than or equal to 10, add points. Othe +rwise, # continue parsing the message. if ($nl_count >= 10) { return 1; } } # If we have gotten this far, the message is legit as # far as this module is concerned, don't add points. return 0; }
You can find complete explanations on these here.

The Web is like a dominatrix. Everywhere I turn, I see little buttons ordering me to Submit. (Nytwind)

Replies are listed 'Best First'.
Re^2: Extract IP from email dataset?
by moritz (Cardinal) on May 12, 2009 at 15:14 UTC
    package MultiNewLine; 1;

    Just a tiny piece of nit-picking: this is cargo cult programming, the 1; has absolutely no effect. The return value of a module must be a true value, but the return value is that of the last statement which doesn't declare a sub, which in your case is our @ISA = qw(Mail::SpamAssassin::Plugin);

      Thanks moritz. I'll make note of that and change them.

      The Web is like a dominatrix. Everywhere I turn, I see little buttons ordering me to Submit. (Nytwind)
      I use
      return 1; # module loaded OK
      to be clear. Oh, and I put that at the end of the .pm file.

Re^2: Extract IP from email dataset?
by bharadwajh (Initiate) on May 12, 2009 at 15:30 UTC
    The email format Received: (qmail 23153 invoked from network); 14 Apr 2004 21:19:04 -0000 Received: from dev213.omgkitty.com (HELO omgkitty.com) (@192.168.2.118) by dev50.omgkitty.com with SMTP; 11 Jan 2003 22:45:04 -0000 Received: (qmail 11105 invoked by uid 99); 14 Apr 2004 22:19:04 -0000 Received: (qmail 14227 invoked from network); 14 Apr 2004 22:19:04 -0000 Received: from unknown (HELO ns.eisnoc1.net) (66.54.218.128) by www.omgkitty.com with SMTP; 14 Apr 2004 22:45:04 -0000 Received: from listmail (216.157.143.2) by ns.eisnoc1.net (8.11.1/8.11.1) with ESMTP id h0BMg4000951 for <nospam@omgkitty.com>; Sat, 11 Jan 2003 17:42:04 -0500 Message-Id: <200301112242.h0BMg4000951@ns.eisnoc1.net> Date: Sat, 11 Jan 2003 15:40:09 -0500 From: "Ronald George" <ron@247customer.com> To: "Omgkitty User" <nospam@omgkitty.com> Subject: Whats up? MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Mailer: ListMail v1.65 X-LM-Flags: 1.45.82 What’s up?
      use strict; use Regexp::Common qw /net/; my $email = q|Received: (qmail 23153 invoked from network); 14 Apr 200 +4 21:19:04 -0000 Received: from dev213.omgkitty.com (HELO omgkitty.co +m) (@192.168.2.118) by dev50.omgkitty.com with SMTP; 11 Jan 2003 22:4 +5:04 -0000 Received: (qmail 11105 invoked by uid 99); 14 Apr 2004 22: +19:04 -0000 Received: (qmail 14227 invoked from network); 14 Apr 2004 + 22:19:04 -0000 Received: from unknown (HELO ns.eisnoc1.net) (66.54.2 +18.128) by www.omgkitty.com with SMTP; 14 Apr 2004 22:45:04 -0000 Rec +eived: from listmail (216.157.143.2) by ns.eisnoc1.net (8.11.1/8.11.1 +) with ESMTP id h0BMg4000951 for <nospam@omgkitty.com>; Sat, 11 Jan 2 +003 17:42:04 -0500 Message-Id: <200301112242.h0BMg4000951@ns.eisnoc1. +net> Date: Sat, 11 Jan 2003 15:40:09 -0500 From: "Ronald George" <ron +@247customer.com> To: "Omgkitty User" <nospam@omgkitty.com> Subject: +Whats up? MIME-Version: 1.0 Content-Type: text/plain; charset="iso-88 +59-1" Content-Transfer-Encoding: 7bit X-Mailer: ListMail v1.65 X-LM-F +lags: 1.45.82 What’s up? |; while ($email =~ m/$RE{net}{IPv4}{dec}{-keep}/g) { print "$1\n"; }
      Output:
      192.168.2.118 66.54.218.128 216.157.143.2

      CountZero

      A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

      If you have this "limited" set then you can simplify your regex to extract something that looks like an IP address though it won't validate it in any form.

      Just think about what you are looking for and work from there. How do you as a human pick out IP addresses? Now how can you define that in a regex?

      Don't worry if it's a valid IP since you maybe able to assume that the source already has done some of that for you. If you do want to make sure you only get valid IP then as already suggested, check CPAN. (Or work it out for yourself as a nice exercise.)

      Please edit your post and put in <code> tags. You know the "preview" you get before committing your post? That's how it's going to look to everyone else!