It's much easier to parse a structure if you mark the end types as well as the start. You could do something like the following:
use strict;
use warnings;
use Data::Dumper;
my $structure = main(join '',<DATA>);
print Dumper($structure);
sub main {
my ($s, $c, %hash) = $_[0];
while ($s =~ /START PAGE(?: (\w+))?\s+(.*?)\s+END PAGE/gs) {
$hash{$1 ? $1 : ++$c} = page($2);
}
return \%hash;
}
sub page {
my ($s, $c, %hash) = $_[0];
while ($s =~ /START QUESTION(?: (\w+))?\s+(.*?)\s+END QUESTION/gs)
+ {
$hash{$1 ? $1 : ++$c} = question($2);
}
return \%hash;
}
sub question {
my ($s, %hash) = $_[0];
($hash{'label'}) = $s =~ /LABEL (.*)/;
$s =~ /START CHOICES\s+(.*?)\s+END CHOICES/s;
for (split / *\n */, $1) {
push @{$hash{'choices'}}, [split / /, $_, 2];
}
return \%hash;
}
__DATA__
START PAGE p1
START QUESTION 4B
LABEL Do you like your pie with ice cream?
START CHOICES
1 Yes
2 No
END CHOICES
END QUESTION
START QUESTION 4C
LABEL Do you like your pie with whipped cream?
START CHOICES
1 Yes
2 No
END CHOICES
END QUESTION
END PAGE
I've made the choices into arrays instead of hashes, to preserve order.
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.
|