johnnywang has asked for the wisdom of the Perl Monks concerning the following question:

Hi, not sure this is the place to ask a very specific question. What's the easiest way to parse something like:

Jul 15 15:12:10 a=foo time="2004-07-14 01:20:25 UTC" b=abc@foo.com msg="^one can say anything here except quotes$"

basically I'd like to extract the key value pairs, where values are quoted if it contains spaces.

Thanks.

Replies are listed 'Best First'.
Re: a regex question
by kvale (Monsignor) on Jul 16, 2004 at 00:25 UTC
    Generally, the more specific and concrete the question, the better.

    Regarding your question, your description is not quite accurate. There seems to be a date-time stamp in addition to the key-value pairs.

    The possible quotes around the values make it a little tricky. The simplest way to deal with those is to use the Text::xSV module with the set_sep set to a single space. The first three elements will be your date components and the rest will be the key-value pairs that you can split on the '='.

    Another way to approach is to roll your own parser with a gammar derived from the structure of your line and some choice regexps. But I'd try the module first.

    -Mark

Re: a regex question
by FoxtrotUniform (Prior) on Jul 16, 2004 at 00:16 UTC

    When you're building a regex, it pays to be very specific about the data you're expecting. So if your example says:

    • The interesting data are of the form key=value; everything else can be ignored
    • key is a string of \ws
    • value is either a string of \ws or a quoted string that can contain any character but a double-quote
    then you can build a regex to match a key:val pair pretty easily:
    # NOTE: untested /(\w+)= # key (?: # match either value: "([^"]+)" | # quoted string or... (\w+)) # non-quoted value /xg

    --
    F o x t r o t U n i f o r m
    Found a typo in this node? /msg me
    % man 3 strfry

Re: a regex question
by graff (Chancellor) on Jul 16, 2004 at 01:39 UTC
    I totally agree with kvale (++) -- highest preference to using Text::xSV on this problem. And just in case you'd like to look at an alternative approach (just as an educational exercise), consider this:
    $_ = 'Jul 15 15:12:10 a=foo time="2004-07-14 01:20:25 UTC" b=abc@foo.c +om msg="^one can say anything here except quotes$"'; my ($timestamp,%pairs) = split(/\s+(\w+)=/); print "Timestamp => $timestamp\n"; print "$_ => $pairs{$_}\n" for (sort keys %pairs); __OUTPUT__ Timestamp => Jul 15 15:12:10 a => foo b => abc@foo.com msg => "^one can say anything here except quotes$" time => "2004-07-14 01:20:25 UTC"
    The magic is in using split with a capturing regex, which will divide up the string into its component values, while keeping (and passing along) the attribute titles as well.