in reply to Design hints for a file processor

I'd combine some light weight OO, recursive descent parsing and method dispatch. Consider (most of OP's data represented by ellipsis):

use strict; use warnings; my $data = <<DATA; BEGIN DSRECORD Identifier "ROOT" ... CenturyBreakYear "30" END DSRECORD DATA open my $fh, '<', \$data or die "Unable to open data file: $!"; my $obj = main->new (undef, fh => $fh); 1 while defined $obj->nextLine (); close $fh; sub new { my ($class, $context, %params) = @_; $class = ref $class || $class; my $self = bless {%params}, $class; if (defined $context && ref $context) { $self->{$_} = $context->{$_} for keys %$context; } $self->{context} = $context; return $self; } sub nextLine { my $self = shift; my $fh = $self->{fh}; while (defined (my $line = <$fh>)) { chomp $line; return $line unless $line =~ /^\s*begin\s+(\w+)/i; if ($self->{skipping}) { $self->skip (); next; } my $recType = lc $1; my $handler = $self->can ("rec_$recType"); my $nested = $self->new ($self); $handler ? $nested->$handler () : $self->skip ($recType); next; } return undef; } sub skip { my ($self, $recType) = @_; warn ">>> Can't handle $recType records (line $.)\n" if defined $recType; ++$self->{skipping}; 1 while defined $self->nextLine (); --$self->{skipping}; } sub rec_dsrecord { my ($self) = @_; my $fh = $self->{fh}; my @wantedFields = qw/identifier name description/; my $matchStr = join '|', @wantedFields; my $fieldsMatch = qr/$matchStr/i; while (defined (my $line = $self->nextLine ())) { next unless $line =~ /($fieldsMatch)\s+(.*)/; $self->{lc $1} = $2; } my @missingFields = grep {! exists $self->{$_}} @wantedFields; my @gotFields = grep {exists $self->{$_}} @wantedFields; warn "Missing @missingFields fields in DSRECORD ending line $." if @missingFields; print join (', ', map {"$_: $self->{$_}"} @gotFields), "\n"; }

Prints:

>>> Can't handle dssubrecord records (line 15) identifier: "ROOT", name: "AP_CDBS_Vendor_Summary", description: "The +first part of the routine gathers data from the ABAP which extracts t +he necessary data from the SAP tables KNA1 and KNB1 (NB the keys of t +he link between KNA1 and KNB1 will form the basis of all the ABAP que +ries)."

Handlers for new record types are easily added as a sub rec_newrectype sub.

A context dumper can be easily implemented using the "stack" formed by the context links in the objects (see sub new implementation).

The code can be refactored from a light weight main based object into a proper base class with an object factory to create derived classes for each interesting record type.

Common output code could be added to the base class.


Perl is environmentally friendly - it saves trees