pwhysall has asked for the wisdom of the Perl Monks concerning the following question:

I've been handed a file that is laid out thusly
begin_section name=section_name type=section_type begin_item name=item_name desc=item_description link=item_link end_item begin_item . . . end_item end_section begin_section . . . end_section
and so on and so on. What would be a good way of reading this file into a data structure that I can use nicely? I'd thought of setting up a couple of objects; a Section object, which would then contain a number of Item objects. And maybe an overall object to hold all the Section objects. But I'm stalling at the file parsing stage. I don't seem to be able to do this kind of thing:
if(/begin_item/) { while (!/end_item/) { ...deal with data members here... } }
Am I approaching this problem the wrong way? Many thanks in advance for any advice you may have.

Replies are listed 'Best First'.
Re: Turning a datafile into a data structure
by davorg (Chancellor) on Jun 30, 2000 at 12:50 UTC
Re: Turning a datafile into a data structure
by ase (Monk) on Jun 30, 2000 at 15:13 UTC
    I agree with Dave that Parse::RecDescent could be a good way to do it. If you want to "roll your own" though, you could do something like:
    #!/usr/bin/perl -w use strict; package Item; sub new { my $proto = shift; my $class = ref($proto) || $proto; my $self = {}; bless ($self, $class); return $self; } sub add_attribute { my $self = shift; my ($key,$value) = @_; $self->{$key} = $value; } 1; #included in case Item becomes a real Module someday. package Section; sub new { my $proto = shift; my $class = ref($proto) || $proto; my $self = {}; $self->{Items} = []; bless ($self, $class); return $self; } sub add_attribute { my $self = shift; my ($key,$value) = @_; $self->{$key} = $value; } sub add_item { my $self = shift; my $item = Item->new(); push @{$self->{Items}},$item; return $item; } 1; #Ditto package UberObject; sub new { my $proto = shift; my $class = ref($proto) || $proto; my $self = []; bless ($self, $class); return $self; } sub add_section { my $self = shift; my $section = Section->new(); push @$self,$section; return $section; } 1; #again package Main; use Data::Dumper; my $uo = new UberObject(); my ($section,$item) = (undef,undef); while (<>) { #dammit Jim, parse the data already! if (/begin_section/) { $section = $uo->add_section(); } elsif (/end_section/) { undef $section; } elsif (/begin_item/) { $item = $section->add_item(); } elsif (/end_item/) { undef $item; } elsif (/=/) { if (defined($item)) { $item->add_attribute(split(/=/)); } elsif (defined($section)) { $section->add_attribute(split(/=/)); } else { die "Bad attribute at line $."; } } } #probably should do something interesting ..... # .... nah Just Dump it all! my $d=Data::Dumper->new([$uo],['UberObject']); $d->Indent(1); print $d->Dump();

    When tested with a file that contains:
    begin_section name=section1 type=section_type1 begin_item name=item1 desc=item_description1 link=item_link1 end_item begin_item name=item2 desc=item_description2 link=item_link2 end_item end_section begin_section name=section2 type=section_type2 begin_item name=item21 desc=item_description21 link=item_link21 end_item begin_item name=item22 desc=item_description22 link=item_link22 end_item end_section

    yields:
    "The correct thing" (Exercise left to reader)
    You may also consider massaging the data (or writing a converter) to XML and use XML::Parser which can also be used in an OOish way..
    Note: As the last poster mentioned while I was busy typing, Error checking = Good, and I don't do enough of it in my example... another excersize left to reader.
    Hope that helps,
    -ase
RE: Turning a datafile into a data structure
by JanneVee (Friar) on Jun 30, 2000 at 15:11 UTC
    Basicly the advice about RecDescent module is the best. You aren't exactly approaching it in the wrong way.

    Important things to remember are things like error_handling. i.e. in your example

    if (!/end_item/) { if(/begin_item/) { while (!/end_item/) { ...deal with data members here... } } } else { ... print or die an error message here about an end_item without b +egin ... }
    or if you have items in items. then you must handle which end belongs to which begin!

    Some of these things that one must think of you get for "free" using the module that davorg suggested...

Re: Turning a datafile into a data structure
by takshaka (Friar) on Jun 30, 2000 at 21:59 UTC
    if(/begin_item/) { while (!/end_item/) { ...deal with data members here... } }
    If you do decide to roll your own rather than use Parse::RecDescent, this is the sort of thing for which the range operator is useful.
    while (my $range = /begin_item/i.../end_item/i) { # skip the first and last lines if you want... next unless $range > 1 && substr($range, -2) ne 'E0'; # deal with data member }
Re: Turning a datafile into a data structure
by btrott (Parson) on Jun 30, 2000 at 20:29 UTC
    If you happen to change your data format into XML of some sort, you could use XML::Simple to load the data into a hash reference. It's very nice and easy and does a good job.

    Another option would be to get an in-memory representation of the object (which I realize is part of the problem :), then use Data::Dumper to serialize it to disk.

    I realize that these don't help you with the data file format that you have, but in case you're investigating other options, these are two good ones.

Many, many thanks
by pwhysall (Acolyte) on Jun 30, 2000 at 18:21 UTC
    Thank you all for your time and brainpower.