comment on

This is long, so I will use a readmore but to start off let me 'splain what I am doing. NOTE: I am looking for suggestions and constructive criticism as this is quite helpful for me in learning how to code more properly (considering I am a self-taught coder).

Ok, so, moving on. I have a configuration file that I parsing. I had an old parser but decided that the config file was just not human readable or friendly for that matter. So, I decided to go toward a new "look" which is posted below. Now, the thing I would like suggestions on is my technique for parsing out the config and whether or not I am missing out on a module that is written to do this kind of parsing already (why re-invent the wheel?).

I have the following configuration file (no where near the complete production config file but it serves my purposes for this writeup):

define {
    destination = "/u90/gvc_archive/new";
    runonce    = "port100";
}

#####*******#####
##  default macros
#####*******#####

macro arbor_ama {
    regex  = "/F.*?-P.*?\.(\d+)\.ama/";
    dfield = "$1";
}

macro dex {
    regex  = "/^P.*?_DSC_.*?\.(\d+)\.ama$/";
    dfield = "$1";
}

macro rpt {
    regex  = "";
    dfield = "";
}

macro rptnull {
    regex  = "";
    dfield = "";
}

macro rtcd_everything {
    regex  = "/.*?/";
    dfield = "2__";
}

macro arbor1_1 {
    regex  = "/^F.*?_D(\d+)_.*?PRI_1_1\.ama$/";
    dfield = "$1";
}

macro arbor {
    regex  = "/^F.*?\.(\d+)\.ama$/";
    dfield = "$1";
}

macro usl1 {
    regex  = "/^USL_.*?_(\d{6})_.*?$/";
    dfield = "$1";
}

macro usl2 {
    regex  = "/^USL_.*?_(\d{6})$/";
    dfield = "$1";
}

macro uslnull {
    regex  = "/^USL.*?$/";
    dfield = "NULL";
}

#####*******#####
##  Individual Rulesets here
#####*******#####

## port 11
# P11_02-04-02_01:02:00_020001.030001.41062.01.2
#10!11,41,61,77,85!rtcd_everything  <-- old way

rule 10 {
    port   = "port11,port41, port61,port77,port85";  # space in here o
+n purpose
    regex  = ;                                       # left blank on p
+urpose
    dfield = "NULL";
    macro  = "rtcd_everything";
}

rule 60 {
    port   = "87";                                    # didn't use "po
+rt##" on purpose
    regex  = "/F.*?\-P.*?_FCC_(\d+)_.*?\.cdr/"
    dfield = "$1";
    macro  = "usl1";
    macro  = "usl2";
}

## port 100 stuff
# P040_PRI_487460_487559.0204.ama
# 100!100!/P\d+_(PRI|SEC|TPP)_.*?\.(\d{4})\.ama/!$2 <-- old

rule 100 {
    port   = "100";
    regex  = /P\d+_(PRI|SEC|TPP)_.*?\.(\d{4})\.ama/;
    dfield = "$2";
}
[download]

Now the following is the code I am using so far to parse through this file. It is nowhere *near* complete but I would like comments on the direction I am heading so far. Basicly, I don't want to get too far into it unless I am going the right direction.

#!/usr/local/bin/perl -w
# test parser prior to plugging into larger script
# to replace the old function.

use strict;
use Env;

my $file = "$HOME/archive_bin/configs/dtfr_archiver.conf";
my ($class,$type,$var,$rval,%config);

open(F,$file) or die("Can't open it: $!\n");

while ( <F> ) {
    chomp;
    my $current_line = $_;
    my $ok           = "^(macro|define|rule)";

    next if ( /^#/ );       # skip comments
    next if ( /^\n/ );      # skip newlines and all only-spaces

    my $check = (/$ok\s+\w+\s+\{/ .. /^\}/);
    
    do {
    if ( $check == 1 ) {

        $current_line =~ /^([a-z]+)/;
        $class        = $1;
        print "class: $class\n";            # do more checking here

        $current_line =~ /^.*? (\w+)\s+\{/; # do more checking here
        $type         = $1                  # do more checking here
        print "type: $type\n\n";            # do more checking here

    }

    if ( $check !~ /EO/ and $check > 1 ) {
        if ( $current_line =~ /=/ ) {
        ($var,$rval)   = split(/=/,$_);
        $var           =~ tr/ //d;
        $rval          =~ tr/" ;//d;

        print "var: $var and rval: $rval\n";
        #$config{$type}{$var} = $rval;
        }
    }
    } if $check;
}
[download]

output from the above code:

class: macro
type: rtcd_everything

var: regex and rval: /.*?/
var: dfield and rval: 2__

class: macro
type: arbor1_1

var: regex and rval: /^F.*?_D(\d+)_.*?PRI_1_1\.ama$/
var: dfield and rval: $1

class: macro
type: arbor

var: regex and rval: /^F.*?\.(\d+)\.ama$/
var: dfield and rval: $1

class: macro
type: usl1

var: regex and rval: /^USL_.*?_(\d{6})_.*?$/
var: dfield and rval: $1

class: macro
type: usl2

var: regex and rval: /^USL_.*?_(\d{6})$/
var: dfield and rval: $1

class: macro
type: uslnull

var: regex and rval: /^USL.*?$/
var: dfield and rval: NULL
[download]

There we go. There is still a lot I will have to do since I need to do a lot of sanity checking. The end result will be assigning the values to a multidim hash or a hash of lists (or something along those lines) and pass that onto some other functions to do the work. As you can tell some of this is still conceptual which I usually code through my concepts.

Any thoughts are greatly appreciated.

TIA guys

_ _ _ _ _ _ _ _ _ _
- Jim
Insert clever comment here...

In reply to Advice sought for parsing config file by snafu

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.