Here's one possible regex approach using named captures (perlre) and %+. However, it's starting to get a little complex, so the line-based state machine type approach from Corion's post is starting to look a little better in this case (I gave some more examples of the state machine approach in this thread).
use warnings; use strict; use Data::Dump; local $/ = "\n\n\n"; while (<DATA>) { # clobber the header s/ ^ \s* Dumpdata\s+example \s* \n \s* -+ \s* \n //msx or next; my %row; $row{$+{key}} = $+{val} while m{ (?: # the very first key doesn't need colon \A \s* (?<key> \w+ ) # but the other keys need colons | ^ \s* (?<key> \w+ : ) ) \s+ # values should end at the next key (?<val> (?: (?!^\s*\w+:) . )+ ) }xmsg; s/\s+/ /g for values %row; dd \%row; } __DATA__ Dumpdata example ----------------- Warning bad news here Detail: Some really nice infos these are Info: This is a problem but there is a solution Dumpdata example ----------------- Warning test Detail: foo bar Info: quz baz Spec: blah
Output:
{ "Detail:" => "Some really nice infos these are ", "Info:" => "This is a problem but there is a solution ", "Warning" => "bad news here ", } { "Detail:" => "foo bar ", "Info:" => "quz baz ", "Spec:" => "blah ", "Warning" => "test ", }
Update: Using the "branch reset" pattern (?|...) allows for a little bit of simplification:
my %row = m{ (?| \A \s* ( \w+ ) | ^ \s* ( \w+ : ) ) \s+ ( (?: (?!^\s*\w+:) . )+ ) }xmsg;
In reply to Re^5: Joining multiple lines together while parsing
by haukex
in thread Joining multiple lines together while parsing
by Arengin
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |