stevbutt has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks

Please help with some wise and efficient string matching wisdom

Input :

May  2 04:06:15 lon.mail.net exim[17905]: 2012-07-03 07:06:15 1SPPtO-0004en-PS <= me@ours.co.uk H=smtpout.mail.com [22.5.10.4] I=[6.5.14.4]:25 P=esmtp S=13333 id=6aeca3b79b8892d6105dab131c76f066@localhost.localdomain T="Half price offer"

I want to grab the IP address ( 22.5.10.4 without the square brackets ) the email address ( me@ours.co.uk which always follows <= )

so far I have the ip address but with the square brackets using :

my ($srvrip) = $remainder =~ m/H=.+?(\[.+?\])/;

How can I extract the email address ?

I have a lot of lines in the log files so need this to be as efficient as possible and am also restricted to perl 5.8.4

Hope you can help

Replies are listed 'Best First'.
Re: String Matching
by davido (Cardinal) on Aug 14, 2012 at 01:51 UTC

    m/<=\s*(\S+)[^[]+\[([^\]]+)/

    Here it is with nicer formatting and a basic explanation:

    m/ <=\s* (\S+) # Capture the email address following <= [^[]+\[ # Skip to the first subsequent square bracket. ([^\]]+) # Capture until a closing bracket. /x

    You can tinker with it yourself here.

    The email address will be in $1 and the IP will be in $2, following a successful match.

    Update: Silly me for trusting the OP's spec. Kenosis mentioned to me that the exim record could, in addition to <= also contain any of ==, **, =>, *>, ->, and possibly some others. So the <= anchor is probably not ideal, but could be improved upon with (?:<=|==|\*\*|=>|\*>|=>) (plus whatever others are legal).


    Dave

      Thanks Dave,

      The Spec is correct - This is already in a if/ifelse statement where we know if we are dealing with == ** etc So what you have shown me is just perfect,

      many thanks

      Steve

        Fantastic! My faith in humanity is restored. ;) ...and I'm glad it worked for you.


        Dave

Re: String Matching
by GrandFather (Saint) on Aug 14, 2012 at 01:12 UTC

    What have you tried?

    As an aside don't fall for the "efficient as possible" tripe. Getting wrong answers fast is not generally considered a good solution. Work on getting the correct answers first then (and only if the solution takes too long to run) consider how you can make it faster.

    True laziness is hard work

      This is just so true.

Re: String Matching
by rpnoble419 (Pilgrim) on Aug 14, 2012 at 07:08 UTC
    If the layout is fixed (that is if the data changes but the position of the data does not change, then try this:
    $_='May 2 04:06:15 lon.mail.net exim[17905]: 2012-07-03 07:06:15 1SPP +tO-0004en-PS <= me@ours.co.uk H=smtpout.mail.com [22.5.10.4] I=[6.5.1 +4.4]:25 P=esmtp S=13333 id=6aeca3b79b8892d6105dab131c76f066@localhost +.localdomain T="Half price offer"'; my @data= split(/ /); my $Email=$data[10]; my $IP=$data[12]; $IP=~s/\[//g; $IP=~s/\]//g; print "Email: $Email\n"; print "IP: $IP\n";
    As you are limited to Perl 5.8.4, regex's are not as fast as in 5.10 and up so I would try to limit the data I perform a regex on as you never know what will change and cause your program to bomb (usually at 3:00am on a Sunday morning). I would split your data into its many parts and then run what ever regex you need on a smaller data chunk. For the email you don't even need a regex. The square brackets can be removed in any number of ways, I choose the lazy way in my example.
Re: String Matching
by 2teez (Vicar) on Aug 14, 2012 at 07:43 UTC
    Hi,

    If your logfile has it's data with fixed "width", then using unpack function can really come in handy! And you really wouldn't border on perl version you are using. see this:

    use warnings; use strict; my $str = 'May 2 04:06:15 lon.mail.net exim[17905]: 2012-07-03 07:06:15 1SPPtO- +0004en-PS <= me@ours.co.uk H=smtpout.mail.com [22.5.10.4] I=[6.5.14.4 +]:25 P=esmtp S=13333 id=6aeca3b79b8892d6105dab131c76f066@localhost.lo +caldomain T="Half price offer"'; my ( $e_mail, $ip ) = unpack "x82A13x21A9", $str; print "EMAIL: ", $e_mail, "\nIP: ", $ip, $/; # OR while (<DATA>) { my ( $e_mail, $ip ) = unpack "x82A13x21A9", $_; print "EMAIL: ", $e_mail, "\nIP: ", $ip, $/; } __DATA__ May 2 04:06:15 lon.mail.net exim[17905]: 2012-07-03 07:06:15 1SPPtO-0 +004en-PS <= me@ours.co.uk H=smtpout.mail.com [22.5.10.4] I=[6.5.14.4] +:25 P=esmtp S=13333 id=6aeca3b79b8892d6105dab131c76f066@localhost.loc +aldomain T="Half price offer"
    OUTPUT
    EMAIL: me@ours.co.uk
    IP: 22.5.10.4

    Check perldoc perlpacktut for more info.

    UPDATE: Oops! my bad I missed that but was pointed out by Kenosis though, Please Note however, if the length of the field to be gotten varies, then unpack function will NOT also work.
    However, I had mentioned perviously that the logfiles data MUST have a FIXED WIDTH.

A reply falls below the community's threshold of quality. You may see it by logging in.