parsing data pairs from single line

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I have a line constructed like this:
pro * con * date * at * pri * msg *
Where '*' can be any number of words or symbols. Case does not matter. Further the data pair can be in any order making
con * DAte * aT * mSg * pri * pro *
Just as legal as the former example. I'd like to get this data in hash such that pro, con, date, at, pri and msg are the keys with the *'s as the data. For example:

pro my project con my customer date 2009-10-5 at 17:00 pri 2 msg Rack 
+new server
pro = my project
con = my customer
date = 2009-10-5
at = 17:00
pri = 2
msg = Rack new server
[download]

I've tried several different regexes. The closest is

if ( m/\s?pro\s(.*?)\s(con|due|at|pri|msg)\s/i ){
   print "pro=".$1."\n";
}
[download]

However, if 'pro *' is at the end of the line it does not match. Upon trying to account for being at the end of the line I end up with capturing either the entire remainder of the line or just a single white space.

Any ideas?

Comment on parsing data pairs from single line Select or Download Code

Replies are listed 'Best First'.
Re: parsing data pairs from single line by kennethk (Abbot) on Oct 14, 2009 at 17:37 UTC
A couple techniques would likely be helpful for you. If you change your whitespace `\s` to a word boundary `\b`, it will match at the end of a line. As well, using look ahead assertions, you can stop when you next hit one of your delimiters. #!/usr/bin/perl use strict; use warnings; my $data = 'pro my project con my customer date 2009-10-5 at 17:00 pri + 2 msg Rack new server'; #pro = my project #con = my customer #date = 2009-10-5 #at = 17:00 #pri = 2 #msg = Rack new server my @keywords = qw(pro con due date at pri msg); # \b(pro\|con\|due\|date\|at\|pri\|msg)\s((?:(?!(pro\|con\|due\|date\|at\|pri\|msg +)\b).)) my $regex = '\b(' . join('\|', @keywords) . ')\s((?:(?!(' . join('\|', @ +keywords) . ')\b).))'; while ($data =~ /$regex/ig) { print "$1 = $2\n"; } [download] See perlretut for more info.	[reply] [d/l] [select]
Re^2: parsing data pairs from single line by johngg (Canon) on Oct 14, 2009 at 20:58 UTC
`my $regex = '\b(' . join('\|', @keywords) . ')\s((?:(?!(' . join('\|', @keywords) . ')\b).))';`* Rather than all the concatenation and the joins, you could take advantage of the double-quote-like behaviour of regexen by localising the list separator in a do block and interpolating `@keywords`. `my $regex = do { local $" = q{\|}; qr{(?x) \b (@keywords) \s ( (?: (?! (@keywords) \b ) . )* ) } };` [download] It looks a little clearer to my eye. Cheers, JohnGG	[reply] [d/l] [select]
Re: parsing data pairs from single line by gmargo (Hermit) on Oct 14, 2009 at 17:55 UTC
I inserted an unlikely string next to the keywords and then used split. `my $input = "pro my project cOn my customer dAte 2009-10-5 at 17:00 pr +i 2 msg Rack new server"; print "input=$input\n"; my @keywords = qw(pro con date at pri msg); my $splitmarker = "___YABBA_DABBA_DOO___"; my %results; $input =~ s/\b($_)\b/$splitmarker$1/i foreach @keywords; my @parts = split /$splitmarker/,$input; foreach (@parts) { $results{lc($1)} = $2 if /(\w+)\s+(.+?)\s*$/; } foreach (@keywords) { print "$_ = \"$results{$_}\"\n" if exists $results{$_}; }` [download]	[reply] [d/l]
Re: parsing data pairs from single line by mickep76 (Beadle) on Oct 15, 2009 at 07:13 UTC
You can assign it to a hash using split. `my $text = "pro my project con my customer date 2009-10-5 at 17:00 pri + 2 msg Rack new server"; my @list = split /(pro\|con\|date\|at\|pri\|msg)\s/, $text; shift @list; my %hash = @list; foreach(sort keys %hash) { printf "$_, %s\n", $hash{$_} }` [download]	[reply] [d/l]