maikelnight has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks, i have an array with Strings, which looks like:

$VAR1 = 'Nov 19 06:31:17 proxy postgrey[2439]: action=pass, reason=tri +plet found, client_name=r41.newsletter.otto.de, client_address=185.15 +.51.41, sender=otto@newsletter.otto.de, recipient=some.one@some.domai +n'; $VAR2 = 'Nov 19 06:37:45 proxy postgrey[2439]: action=pass, reason=tri +plet found, client_name=uspmta194080.emarsys.net, client_address=217. +175.194.80, sender=suite17@xpressus.emarsys.net, recipient=other.one@ +some.domain';

I need to push some strings from the array into another array - for example the date, sender and recipient. I first try with date, like:

for my $x ( @results ) { chomp $x; my ($date) = split /'[A-z]*.[0-9]*.[0-9]*:[0-9]*:[0-9]*/, $x; push @out, $date; }

What results in the whole line:

$VAR1 = 'Nov 19 06:31:17 proxy postgrey[2439]: action=pass, reason=tri +plet found, client_name=r41.newsletter.otto.de, client_address=185.15 +.51.41, sender=otto@newsletter.otto.de, recipient=some.one@some.domai +n';
May i ask someone to shed some light, please! Thanks!

Replies are listed 'Best First'.
Re: Array of strings search
by hippo (Archbishop) on Nov 23, 2017 at 15:58 UTC
    my ($date) = split /'[A-z]*.[0-9]*.[0-9]*:[0-9]*:[0-9]*/, $x;

    The leading single quote is not part of the data, so your regex will not match. Even if it did match, you are losing precisely the data you want to retain.

    Better to either split on something appropriate (like whitespace) and work on those fields, or else extract via a regex with m// without splitting.

    Addendum: Or use substr for fields of fixed position and width such as the timestamp.

Re: Array of strings search
by Laurent_R (Canon) on Nov 23, 2017 at 17:21 UTC
    You're actually not too far away. This would work with your example:
    my ($date) = ($VAR1 =~ /([A-z]* [0-9]*.[0-9]*:[0-9]*:[0-9]*)/);
    But using [A-z] and the * quantifier is not a very good idea, because it makes a rather weak regex and you may end up capturing things that you don't want.

    This might already be significantly better:

    my ($date) = ($VAR1 =~ /(^\w+ \d+ \d+:\d+:\d+)/);
Re: Array of strings search
by AnomalousMonk (Archbishop) on Nov 23, 2017 at 18:04 UTC

    See also the discussion of this recent related question.


    Give a man a fish:  <%-{-{-{-<

      Ok, got it so far, but i cant get to split the sender part from it. I tried:

      ( $sender ) = split (/\s(sender=(.*)@(.*),)/);

      But just get cut the part after the "," and \s away, like:

      client_address=185.15.51.41, EMPTYFROMHERE

      What am i missing? Online Regex tools tell me that all is great.

        You obviously don't really understand split. Follow the link and look it up.

        split is aimed at splitting a string into a list of parts, in order to later use these parts. Usually, the first parameter of split is the separator on which you want to break up the string (although the separator can be a bit more complicated than a simple character). You could certainly use split on your problem (as a part of the solution), it is probably not the best tool for your task. A simple regex is probably better.

        One possible way to get the sender:

        my $sender = $1 if /sender=(\S+)/;
        This captures a string of as many characters other than spaces as possible following the "sender=" string in the $_ variable.

        Update: I have never actually seen any problem with this syntax, but I have read several times and AnomalousMonk reminds me that conditionally defining lexicals can lead to unexpected results. Perhaps it is better to write:

        my $sender; $sender = $1 if /sender=(\S+)/; if (defined $sender) { # ...
        or possibly:
        if (/sender=(\S+)/) { my $sender = $1; do_something_with_sender(); }
        Thanks, AnomalousMonk.
        I tried: ...

        Exactly what code did you try? On exactly what data? Please see Short, Self-Contained, Correct Example.

        ( $sender ) = split (/\s(sender=(.*)@(.*),)/);

        This doesn't look like a sensible split invocation. Have you read and understood the split documentation?

        client_address=185.15.51.41, EMPTYFROMHERE

        I don't understand where this extracted substring comes from; the substring  EMPTYFROMHERE appears nowhere in the OPed data. Again, Short, Self-Contained, Correct Example. Please help us to help you.

        Online Regex tools tell me that all is great.

        Which online regex tools? How does their advice apply to the operation of split?


        Give a man a fish:  <%-{-{-{-<

Re: Array of strings search
by karlgoethebier (Abbot) on Nov 24, 2017 at 11:53 UTC

    Cut it into pieces if you hate the regexes:

    #!/usr/bin/env perl use strict; use warnings; use Data::Dump; use feature qw(say); my $string = q(Nov 19 06:37:45 proxy postgrey[2439]: action=pass, reason=triplet fo +und, client_name=uspmta194080.emarsys.net, client_address=217.175.194 +.80, sender=suite17@xpressus.emarsys.net, recipient=other.one@some.do +main); my @fields = split /action=|reason=|client_name=|client_address=|sender=|recipien +t=/, $string; my $date = ( split / proxy.+/, ( shift @fields ) )[0]; say $date; for ( 0 .. scalar @fields - 1 ) { $fields[$_] =~ s/, //; say $fields[$_]; } __END__ karls-mac-mini:monks karl$ ./split.pl Nov 19 06:37:45 pass triplet found uspmta194080.emarsys.net 217.175.194.80 suite17@xpressus.emarsys.net other.one@some.domain

    I'll burn for this.

    Best regards, Karl

    «The Crux of the Biscuit is the Apostrophe»

    perl -MCrypt::CBC -E 'say Crypt::CBC->new(-key=>'kgb',-cipher=>"Blowfish")->decrypt_hex($ENV{KARL});'Help

Re: Array of strings search
by Marshall (Canon) on Nov 27, 2017 at 20:57 UTC
    When approaching a task like this, my first inclination would be to write a fairly generic parser for lines that use this format.

    The parseline() subroutine code below takes a text line as input and makes a hash table of key,value pairs and returns a reference of that hash to the "main program". I called the stuff at the beginning of the line just "tag". Of course "tag" could be further broken down into a key for "date", "time" and "whatever the stuff after the date/time means".

    The main "work horses" for parsing textual data are: split() and "match regex global". The code below uses both techniques.

    I tried to be straight-forward, but I am quite sure that understanding the code below will require study.

    I will note that is unusual to parse an array of input lines. More normal would be to parse lines as they come in, save what is needed from those lines and move on. That approach is more efficient and scalable.

    #!usr/bin/perl use warnings; use strict; use Data::Dumper; my @lines = ('Nov 19 06:31:17 proxy postgrey[2439]: action=pass, reaso +n=triplet found, client_name=r41.newsletter.otto.de, client_address=1 +85.15.51.41, sender=otto@newsletter.otto.de, recipient=some.one@some. +domain' ,'Nov 19 06:37:45 proxy postgrey[2439]: action=pass, reason=triplet fo +und, client_name=uspmta194080.emarsys.net, client_address=217.175.194 +.80, sender=suite17@xpressus.emarsys.net, recipient=other.one@some.do +main'); foreach my $line (@lines) { my $hash_ref = parseline ($line); my %tokens = %$hash_ref; print "line=$line\n"; foreach my $key (keys %tokens) { print "key=$key \t value=$tokens{$key}\n"; } print "\n"; #blank line spacer } # parse line creates a hash of keys and values representing # the contents of the line sub parseline { my $line = shift; my %tokens; my ($beginning_tag, $rest) = split (': ', $line,2); #space after t +he : required %tokens = ($rest =~ /(\S+)=(.+?)(?:,|$)/g); $tokens{tag} = $beginning_tag; return \%tokens; } __END__ Prints: line=Nov 19 06:31:17 proxy postgrey[2439]: action=pass, reason=triplet + found, client_name=r41.newsletter.otto.de, client_address=185.15.51. +41, sender=otto@newsletter.otto.de, recipient=some.one@some.domain key=client_address value=185.15.51.41 key=action value=pass key=reason value=triplet found key=recipient value=some.one@some.domain key=tag value=Nov 19 06:31:17 proxy postgrey[2439] key=client_name value=r41.newsletter.otto.de key=sender value=otto@newsletter.otto.de line=Nov 19 06:37:45 proxy postgrey[2439]: action=pass, reason=triplet + found, client_name=uspmta194080.emarsys.net, client_address=217.175. +194.80, sender=suite17@xpressus.emarsys.net, recipient=other.one@some +.domain key=reason value=triplet found key=recipient value=other.one@some.domain key=action value=pass key=client_address value=217.175.194.80 key=client_name value=uspmta194080.emarsys.net key=tag value=Nov 19 06:37:45 proxy postgrey[2439] key=sender value=suite17@xpressus.emarsys.net