Now your basic problem is that you need to recognize seeing "object" and then extract nested parens. But unfortunately for you you cannot possibly handle arbitrary nesting with an RE. However the following code works for me with the example data you gave:
Note that this is rather fragile though. (For instance I assume that parens don't appear in quoted fields, I assume that blocks start and end on lines without other things in them, I assume that there are not things looking like the start of a block elsewhere in the file...) And note that once you have your array of objects you will want to try to try to match specific patterns to extract data you want to extract out of the objects. (Data like the id, origin, etc.)use strict; use Carp; my @objects = get_objects("data"); print join "\n\n", @objects; sub get_block { my $a_ref = shift; my $open = 0; my @read; while (my $line = shift @$a_ref) { push @read, $line; $open += () = ($line =~ /\{/g); $open -= () = ($line =~ /\}/g); if (0 == $open) { return join '', @read; } elsif (0 > $open) { my $not_block = join '', @read; warn("Too many closing parentheses in:\n$not_block\n\n"); unshift @$a_ref, @read; return; } } confess "Unclosed brace at end of file\n"; } # Takes a filename, returns the contents. sub get_file { my $file = shift; local *IN; open (IN, "< $file") or confess "Cannot read '$file': $!"; if (wantarray) { return <IN>; } else { return join '', <IN>; } } # Takes the name of a pil-file and returns an array of object blocks. # Not particularly robust. sub get_objects { my @objs; my @lines = get_file(shift); while (my $line = shift @lines) { if ($line =~ /object\s*\{/) { # Found a block? unshift @lines, $line; push @objs, get_block(\@lines); } } return @objs; }
If you have the energy to learn Parse::RecDescent and figure out a real grammar, it will be much much better than a hack like the above...
In reply to Re (tilly) 1: Parsing a multiline data structure
by tilly
in thread Parsing a multiline data structure
by HamNRye
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |