cyzza has asked for the wisdom of the Perl Monks concerning the following question:

Hey all, I am currently writing a postfix log parser, a reasonable example is in $string.
a whole heap of key=value pairs, easy enough.
but I want to see if I can optimise what I already have done:
my $string = 'to=<adam.clark@ngv.vic.gov.au>, relay=monet1.ngv.vic.gov +.au[10.10.10.20]:25, delay=0.54, delays=0.06/0.02/0/0.46, dsn=2.0.0, +status=sent '; $LogLineHash = qr { ^ ([^=]*)=<?(.*?)>?,?\s+) (.*?) $ }xi; while ( $string ){ print "String: $string\n"; ( $junk , $key , $value, $string ) = split( /$LogLineHash/ , $string ); print "Key: $key\nValue: $value\nLeft Over: $string\n\n"; $Hash{$key}=$value; } while ( my ($key, $value) = each(%Hash) ) { print "$key => $value\n"; }
Which gets me:
String: to=<adam.clark@ngv.vic.gov.au>, relay=monet1.ngv.vic.gov.au[10 +.10.10.20]:25, delay=0.54, delays=0.06/0.02/0/0.46, dsn=2.0.0, status +=sent Key: to Value: adam.clark@ngv.vic.gov.au Left Over: relay=monet1.ngv.vic.gov.au[10.10.10.20]:25, delay=0.54, de +lays=0.06/0.02/0/0.46, dsn=2.0.0, status=sent String: relay=monet1.ngv.vic.gov.au[10.10.10.20]:25, delay=0.54, delay +s=0.06/0.02/0/0.46, dsn=2.0.0, status=sent Key: relay Value: monet1.ngv.vic.gov.au[10.10.10.20]:25 Left Over: delay=0.54, delays=0.06/0.02/0/0.46, dsn=2.0.0, status=sent String: delay=0.54, delays=0.06/0.02/0/0.46, dsn=2.0.0, status=sent Key: delay Value: 0.54 Left Over: delays=0.06/0.02/0/0.46, dsn=2.0.0, status=sent String: delays=0.06/0.02/0/0.46, dsn=2.0.0, status=sent Key: delays Value: 0.06/0.02/0/0.46 Left Over: dsn=2.0.0, status=sent String: dsn=2.0.0, status=sent Key: dsn Value: 2.0.0 Left Over: status=sent String: status=sent Key: status Value: sent Left Over: relay => monet1.ngv.vic.gov.au[10.10.10.20]:25 to => adam.clark@ngv.vic.gov.au dsn => 2.0.0 status => sent delay => 0.54 delays => 0.06/0.02/0/0.46
I was hoping that I could get my regex to dynamically grab all the key value pairs at once with a (?: )+ style grouping such that my regex would be:
^(?:([^=]*)=<?(.*?)>?,?\s+)+$
essentially grabbing arbitrary number of key value pairs, but it's not to be as:

Sample code:
my @bits = split( /^(?:([^=]*)=<?(.*?)>?,?\s+)+$/ , $string ); foreach ( @bits ){ print "$_\n"; }
gives me:
status sent
which is the last key value pair.
Is what I want to do possible?

also, can someone fill me in on when using split(), the first array element is always nothing. hence my $junk variable.

Replies are listed 'Best First'.
Re: Adapting parenthesis in regexps
by ikegami (Patriarch) on Mar 29, 2007 at 12:59 UTC

    split works best when you have a list joined by a seperator, and you split on the seperator.

    sub unbracket { for (my $s = @_ ? $_[0] : $_) { s/^<//; s/>$//; return $_; } } my %hash = map unbracket, map { split /=/ } split /, /, $_; # Or just # my %hash = map unbracket, split /, |=/, $_;
Re: Adapting parenthesis in regexps
by Fletch (Bishop) on Mar 29, 2007 at 12:54 UTC

    Why not just split on /,\s*/ then parse out /([^=]+)=(.*)/?

    Update: And as to your question about the empty element, when your split regexp contains capturing parens the matched delimiters are returned as well as the delimited fields.

Re: Adapting parenthesis in regexps
by Ionitor (Scribe) on Mar 29, 2007 at 13:05 UTC
    Or, you can just skip the use of split all together...
    my %bits = $string =~ /([^=]+)=<?(.+?)>?\s*(?:,\s*|$)/g;
    Edit: added clause to grab last pair
      And, because I'm always looking to practice my regex-fu, here's a version with all greedy matches, which tend to be faster--I timed a roughly 40% speed increase.
      my %bits = $string =~ / ([^=]+) # Key =<? # Seperator ([^,>]+ (?> >[^,>]+)*) # Value >?(?:,\s*|\s*$) # Termination /gx;
Re: Adapting parenthesis in regexps
by bobf (Monsignor) on Mar 29, 2007 at 14:48 UTC

    a whole heap of key=value pairs, easy enough. but I want to see if I can optimise what I already have done

    I guess that depends on how you define "optimise". That could mean reusing existing code (thereby optimizing development time) unless there is a reason not to (such as computational speed, which is another thing that can be optimized).

    Since your input is a bunch of "key=value" pairs, you could use one of the many config modules that parse ini-like files.

    use warnings; use strict; use Config::General; my $string = 'to=<adam.clark@ngv.vic.gov.au>, relay=monet1.ngv.vic.gov +.au[10.10.10.20]:25, delay=0.54, delays=0.06/0.02/0/0.46, dsn=2.0.0, +status=sent '; $string =~ s/, /\n/g; my $conf = Config::General->new( -String => $string ); my %hash = $conf->getall(); while( my ( $k, $v ) = each %hash ) { print "$k => $v\n"; }
    Prints:
    to => <adam.clark@ngv.vic.gov.au> relay => monet1.ngv.vic.gov.au[10.10.10.20]:25 dsn => 2.0.0 status => sent delay => 0.54 delays => 0.06/0.02/0/0.46