This is long, so I will use a readmore but to start off let me 'splain what I am doing. NOTE: I am looking for suggestions and constructive criticism as this is quite helpful for me in learning how to code more properly (considering I am a self-taught coder).

Ok, so, moving on. I have a configuration file that I parsing. I had an old parser but decided that the config file was just not human readable or friendly for that matter. So, I decided to go toward a new "look" which is posted below. Now, the thing I would like suggestions on is my technique for parsing out the config and whether or not I am missing out on a module that is written to do this kind of parsing already (why re-invent the wheel?).

I have the following configuration file (no where near the complete production config file but it serves my purposes for this writeup):

define { destination = "/u90/gvc_archive/new"; runonce = "port100"; } #####*******##### ## default macros #####*******##### macro arbor_ama { regex = "/F.*?-P.*?\.(\d+)\.ama/"; dfield = "$1"; } macro dex { regex = "/^P.*?_DSC_.*?\.(\d+)\.ama$/"; dfield = "$1"; } macro rpt { regex = ""; dfield = ""; } macro rptnull { regex = ""; dfield = ""; } macro rtcd_everything { regex = "/.*?/"; dfield = "2__"; } macro arbor1_1 { regex = "/^F.*?_D(\d+)_.*?PRI_1_1\.ama$/"; dfield = "$1"; } macro arbor { regex = "/^F.*?\.(\d+)\.ama$/"; dfield = "$1"; } macro usl1 { regex = "/^USL_.*?_(\d{6})_.*?$/"; dfield = "$1"; } macro usl2 { regex = "/^USL_.*?_(\d{6})$/"; dfield = "$1"; } macro uslnull { regex = "/^USL.*?$/"; dfield = "NULL"; } #####*******##### ## Individual Rulesets here #####*******##### ## port 11 # P11_02-04-02_01:02:00_020001.030001.41062.01.2 #10!11,41,61,77,85!rtcd_everything <-- old way rule 10 { port = "port11,port41, port61,port77,port85"; # space in here o +n purpose regex = ; # left blank on p +urpose dfield = "NULL"; macro = "rtcd_everything"; } rule 60 { port = "87"; # didn't use "po +rt##" on purpose regex = "/F.*?\-P.*?_FCC_(\d+)_.*?\.cdr/" dfield = "$1"; macro = "usl1"; macro = "usl2"; } ## port 100 stuff # P040_PRI_487460_487559.0204.ama # 100!100!/P\d+_(PRI|SEC|TPP)_.*?\.(\d{4})\.ama/!$2 <-- old rule 100 { port = "100"; regex = /P\d+_(PRI|SEC|TPP)_.*?\.(\d{4})\.ama/; dfield = "$2"; }

Now the following is the code I am using so far to parse through this file. It is nowhere *near* complete but I would like comments on the direction I am heading so far. Basicly, I don't want to get too far into it unless I am going the right direction.

#!/usr/local/bin/perl -w # test parser prior to plugging into larger script # to replace the old function. use strict; use Env; my $file = "$HOME/archive_bin/configs/dtfr_archiver.conf"; my ($class,$type,$var,$rval,%config); open(F,$file) or die("Can't open it: $!\n"); while ( <F> ) { chomp; my $current_line = $_; my $ok = "^(macro|define|rule)"; next if ( /^#/ ); # skip comments next if ( /^\n/ ); # skip newlines and all only-spaces my $check = (/$ok\s+\w+\s+\{/ .. /^\}/); do { if ( $check == 1 ) { $current_line =~ /^([a-z]+)/; $class = $1; print "class: $class\n"; # do more checking here $current_line =~ /^.*? (\w+)\s+\{/; # do more checking here $type = $1 # do more checking here print "type: $type\n\n"; # do more checking here } if ( $check !~ /EO/ and $check > 1 ) { if ( $current_line =~ /=/ ) { ($var,$rval) = split(/=/,$_); $var =~ tr/ //d; $rval =~ tr/" ;//d; print "var: $var and rval: $rval\n"; #$config{$type}{$var} = $rval; } } } if $check; }

output from the above code:

class: macro type: rtcd_everything var: regex and rval: /.*?/ var: dfield and rval: 2__ class: macro type: arbor1_1 var: regex and rval: /^F.*?_D(\d+)_.*?PRI_1_1\.ama$/ var: dfield and rval: $1 class: macro type: arbor var: regex and rval: /^F.*?\.(\d+)\.ama$/ var: dfield and rval: $1 class: macro type: usl1 var: regex and rval: /^USL_.*?_(\d{6})_.*?$/ var: dfield and rval: $1 class: macro type: usl2 var: regex and rval: /^USL_.*?_(\d{6})$/ var: dfield and rval: $1 class: macro type: uslnull var: regex and rval: /^USL.*?$/ var: dfield and rval: NULL

There we go. There is still a lot I will have to do since I need to do a lot of sanity checking. The end result will be assigning the values to a multidim hash or a hash of lists (or something along those lines) and pass that onto some other functions to do the work. As you can tell some of this is still conceptual which I usually code through my concepts.

Any thoughts are greatly appreciated.

TIA guys

_ _ _ _ _ _ _ _ _ _
- Jim
Insert clever comment here...


In reply to Advice sought for parsing config file by snafu

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.