Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I have a line constructed like this:
pro * con * date * at * pri * msg *
Where '*' can be any number of words or symbols. Case does not matter. Further the data pair can be in any order making
con * DAte * aT * mSg * pri * pro *
Just as legal as the former example. I'd like to get this data in hash such that pro, con, date, at, pri and msg are the keys with the *'s as the data. For example:

pro my project con my customer date 2009-10-5 at 17:00 pri 2 msg Rack +new server pro = my project con = my customer date = 2009-10-5 at = 17:00 pri = 2 msg = Rack new server
I've tried several different regexes. The closest is
if ( m/\s?pro\s(.*?)\s(con|due|at|pri|msg)\s/i ){ print "pro=".$1."\n"; }
However, if 'pro *' is at the end of the line it does not match. Upon trying to account for being at the end of the line I end up with capturing either the entire remainder of the line or just a single white space.

Any ideas?

Replies are listed 'Best First'.
Re: parsing data pairs from single line
by kennethk (Abbot) on Oct 14, 2009 at 17:37 UTC
    A couple techniques would likely be helpful for you. If you change your whitespace \s to a word boundary \b, it will match at the end of a line. As well, using look ahead assertions, you can stop when you next hit one of your delimiters.

    #!/usr/bin/perl use strict; use warnings; my $data = 'pro my project con my customer date 2009-10-5 at 17:00 pri + 2 msg Rack new server'; #pro = my project #con = my customer #date = 2009-10-5 #at = 17:00 #pri = 2 #msg = Rack new server my @keywords = qw(pro con due date at pri msg); # \b(pro|con|due|date|at|pri|msg)\s((?:(?!(pro|con|due|date|at|pri|msg +)\b).)*) my $regex = '\b(' . join('|', @keywords) . ')\s((?:(?!(' . join('|', @ +keywords) . ')\b).)*)'; while ($data =~ /$regex/ig) { print "$1 = $2\n"; }

    See perlretut for more info.

      my $regex = '\b(' . join('|', @keywords) . ')\s((?:(?!(' . join('|', @keywords) . ')\b).)*)';

      Rather than all the concatenation and the joins, you could take advantage of the double-quote-like behaviour of regexen by localising the list separator in a do block and interpolating @keywords.

      my $regex = do { local $" = q{|}; qr{(?x) \b (@keywords) \s ( (?: (?! (@keywords) \b ) . )* ) } };

      It looks a little clearer to my eye.

      Cheers,

      JohnGG

Re: parsing data pairs from single line
by gmargo (Hermit) on Oct 14, 2009 at 17:55 UTC

    I inserted an unlikely string next to the keywords and then used split.

    my $input = "pro my project cOn my customer dAte 2009-10-5 at 17:00 pr +i 2 msg Rack new server"; print "input=$input\n"; my @keywords = qw(pro con date at pri msg); my $splitmarker = "___YABBA_DABBA_DOO___"; my %results; $input =~ s/\b($_)\b/$splitmarker$1/i foreach @keywords; my @parts = split /$splitmarker/,$input; foreach (@parts) { $results{lc($1)} = $2 if /(\w+)\s+(.+?)\s*$/; } foreach (@keywords) { print "$_ = \"$results{$_}\"\n" if exists $results{$_}; }
Re: parsing data pairs from single line
by mickep76 (Beadle) on Oct 15, 2009 at 07:13 UTC

    You can assign it to a hash using split.

    my $text = "pro my project con my customer date 2009-10-5 at 17:00 pri + 2 msg Rack new server"; my @list = split /(pro|con|date|at|pri|msg)\s/, $text; shift @list; my %hash = @list; foreach(sort keys %hash) { printf "$_, %s\n", $hash{$_} }