comment on

I will have to second Fastolfe's recommendation. That really is the best long-term solution. But short-term you can roll something quick and dirty.

Now your basic problem is that you need to recognize seeing "object" and then extract nested parens. But unfortunately for you you cannot possibly handle arbitrary nesting with an RE. However the following code works for me with the example data you gave:

use strict;
use Carp;

my @objects = get_objects("data");
print join "\n\n", @objects;


sub get_block {
  my $a_ref = shift;
  my $open = 0;
  my @read;
  while (my $line = shift @$a_ref) {
    push @read, $line;
    $open += () = ($line =~ /\{/g);
    $open -= () = ($line =~ /\}/g);
    if (0 == $open) {
      return join '', @read;
    }
    elsif (0 > $open) {
      my $not_block = join '', @read;
      warn("Too many closing parentheses in:\n$not_block\n\n");
      unshift @$a_ref, @read;
      return;
    }
  }
  confess "Unclosed brace at end of file\n";
}

# Takes a filename, returns the contents.
sub get_file {
  my $file = shift;
  local *IN;
  open (IN, "< $file") or confess "Cannot read '$file': $!";
  if (wantarray) {
    return <IN>;
  }
  else {
    return join '', <IN>;
  }
}

# Takes the name of a pil-file and returns an array of object blocks.
# Not particularly robust.
sub get_objects {
  my @objs;

  my @lines = get_file(shift);
  while (my $line = shift @lines) {
    if ($line =~ /object\s*\{/) {
      # Found a block?
      unshift @lines, $line;
      push @objs, get_block(\@lines);
    }
  }

  return @objs;
}
[download]

Note that this is rather fragile though. (For instance I assume that parens don't appear in quoted fields, I assume that blocks start and end on lines without other things in them, I assume that there are not things looking like the start of a block elsewhere in the file...) And note that once you have your array of objects you will want to try to try to match specific patterns to extract data you want to extract out of the objects. (Data like the id, origin, etc.)

If you have the energy to learn Parse::RecDescent and figure out a real grammar, it will be much much better than a hack like the above...

In reply to Re (tilly) 1: Parsing a multiline data structure by tilly
in thread Parsing a multiline data structure by HamNRye

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.