I'd combine some light weight OO, recursive descent parsing and method dispatch. Consider (most of OP's data represented by ellipsis):

use strict; use warnings; my $data = <<DATA; BEGIN DSRECORD Identifier "ROOT" ... CenturyBreakYear "30" END DSRECORD DATA open my $fh, '<', \$data or die "Unable to open data file: $!"; my $obj = main->new (undef, fh => $fh); 1 while defined $obj->nextLine (); close $fh; sub new { my ($class, $context, %params) = @_; $class = ref $class || $class; my $self = bless {%params}, $class; if (defined $context && ref $context) { $self->{$_} = $context->{$_} for keys %$context; } $self->{context} = $context; return $self; } sub nextLine { my $self = shift; my $fh = $self->{fh}; while (defined (my $line = <$fh>)) { chomp $line; return $line unless $line =~ /^\s*begin\s+(\w+)/i; if ($self->{skipping}) { $self->skip (); next; } my $recType = lc $1; my $handler = $self->can ("rec_$recType"); my $nested = $self->new ($self); $handler ? $nested->$handler () : $self->skip ($recType); next; } return undef; } sub skip { my ($self, $recType) = @_; warn ">>> Can't handle $recType records (line $.)\n" if defined $recType; ++$self->{skipping}; 1 while defined $self->nextLine (); --$self->{skipping}; } sub rec_dsrecord { my ($self) = @_; my $fh = $self->{fh}; my @wantedFields = qw/identifier name description/; my $matchStr = join '|', @wantedFields; my $fieldsMatch = qr/$matchStr/i; while (defined (my $line = $self->nextLine ())) { next unless $line =~ /($fieldsMatch)\s+(.*)/; $self->{lc $1} = $2; } my @missingFields = grep {! exists $self->{$_}} @wantedFields; my @gotFields = grep {exists $self->{$_}} @wantedFields; warn "Missing @missingFields fields in DSRECORD ending line $." if @missingFields; print join (', ', map {"$_: $self->{$_}"} @gotFields), "\n"; }

Prints:

>>> Can't handle dssubrecord records (line 15) identifier: "ROOT", name: "AP_CDBS_Vendor_Summary", description: "The +first part of the routine gathers data from the ABAP which extracts t +he necessary data from the SAP tables KNA1 and KNB1 (NB the keys of t +he link between KNA1 and KNB1 will form the basis of all the ABAP que +ries)."

Handlers for new record types are easily added as a sub rec_newrectype sub.

A context dumper can be easily implemented using the "stack" formed by the context links in the objects (see sub new implementation).

The code can be refactored from a light weight main based object into a proper base class with an object factory to create derived classes for each interesting record type.

Common output code could be added to the base class.


Perl is environmentally friendly - it saves trees

In reply to Re: Design hints for a file processor by GrandFather
in thread Design hints for a file processor by PhilHibbs

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.