Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Parsing a file with parentheses to build a hash

by xcellsior (Novice)
on Nov 13, 2014 at 15:45 UTC ( #1107117=perlquestion: print w/replies, xml ) Need Help??

xcellsior has asked for the wisdom of the Perl Monks concerning the following question:

Hello all,

I wrote a parser for an old kicad brd file a while back that reads all the file contents into a hash so I can make what ever mods to the file that my mind sees fit. Recently, the Kicad maintainers moved to a new format that is compartmentalized with parentheses. The content is basically the same, but the format is a total change. I've just started to think about the best way to parse the file into the same hash structures so the data manipulation on the backend doesn't need to change too much. In an ideal world this is what I would like to happen:

File foo.kicad_pcb:

( (SECTION (Section_KEY Value1) (Another_KEY1 Value2) (KEY2 value3) (KEY3 Value4)) (NEW_SECTION (SUB_SECTION (KEY4 Value5)) (NEW_SUB_SECTION (KEY5 Value6) ) ) )

Perl Hash would look like this:

open ( BRD, "<foo.kicad_pcb") or die("ha ha"); while (<BRD>) { my $new_line = $_; chomp; // magic happens here where a has get filled in like this $DATA_HASH{SECTION}{Section_KEY} = Value1; $DATA_HASH{SECTION}{Another_KEY1} = Value2; $DATA_HASH{SECTION}{KEY2} = Value3; $DATA_HASH{SECTION}{KEY3} = Value4; $DATA_HASH{NEW_SECTION}{SUB_SECTION}{KEY4} = Value5; $DATA_HASH{NEW_SECTION}{NEW_SUB_SECTION}{KEY5} = Value6;

I've looked at a perl module called Text::Balanced but can't seem to make it do what I need (I think my brain stopped working about an hour ago). Anyway, I'm reaching out to see if someone might be able to point me in a better direction. It just seems like there should be an elegant way to do this with out counting parentheses and tracking every thing... aka not the way I would do it in C...

Replies are listed 'Best First'.
Re: Parsing a file with parentheses to build a hash
by RichardK (Parson) on Nov 13, 2014 at 16:29 UTC

    It looks like a simple grammar so I'd use a recursive descent parser, Parse::RecDescent for one.

    This is just a example off the top of my head and completely untested.

    start : '(' section(s) ')' section : '(' id entry(s) ')' entry : '(' section | keypair ')' keypair : '(' key value ')'

    Update

    Looking at this again, entry shouldn't have those terminal brackets -- it would only parse sections like '(id ((key value))', so just

    entry : section | keypair

    (I did say this was just off the top of my head !)

      Oh, quick question. Will I need to barf the entire file into a single var or is there another way to do this? Ex:
      #!/path_to_perl use Parse::RecDescent; $::RD_ERRORS = 1; #Parser dies when it encounters an error $::RD_WARN = 1; #Enable warnings - warn on unused rules &c. $::RD_HINT = 1; # Give out hints to help fix problems. my $grammar = <<'END_OF_GRAMMAR'; # What you said... start : '(' section(s) ')' section : '(' id entry(s) ')' entry : '(' section | keypair ')' keypair : '(' key value ')' END_OF_GRAMMAR my $text; while (<BRD>) { my $new_line = $_; chomp; $text = "$text $new_line"; } my $parser = Parse::RecDescent->new($grammar) or die "Ha Ha, something + is wrong with the syntax, good luck finding the issue!\n"; defined $parser->section($text) or die "It helps if there is a section + to find...";

      I'm just writing, I havent tested anything yet, so I'm sure there are a few issues still in my understanding:) Starting to read Why won't this basic Parse::RecDescent example work?

        I'd use File::Slurp read_file, but there are lots of ways to do it.

        use File::Slurp; my $text = read_file( $filename );

      Very interesting approach. Seems that I just need to setup the grammer according to the specification and all is done. I so like making things more difficult than this:) I will have to play with this just to see where it, or I, falls apart. Great advice/pointer!

      So, I'm playing with this and it looks like I'm going to use regular expression to break out some of the unique grammer(s) so I can write grammers for each section since they don't all follow the same rules... I'm not sure how to put the info into a hash and I'm getting an error that I don't know how to fix:
      #!/usr/bin/perl use Parse::RecDescent; use Data::Dumper; $::RD_ERRORS = 1; #Parser dies when it encounters an error $::RD_WARN = 1; #Enable warnings - warn on unused rules &c. $::RD_HINT = 1; # Give out hints to help fix problems. our %DATA; #my $module_grammar = <<'END_OF_MODULE_GRAMMAR'; #start : start_module #start_module : '(module ' name module_values #module_values : '(' section | keypair ')' #section : '(' fp_text #keypair : '(' key value ')' #END_OF_MODULE_GRAMMAR my $net_grammer = <<'END_OF_NET_GRAMMER'; start : nets(s) nets : '(net ' node name ')' { $main::DATA{"NET"}{$item{'node'}} = $item{'name'}; print "$_\n" for @item{'node'}; } node : m/\d+/ name : m/\S+/ END_OF_NET_GRAMMER my $text; open ( BRD, "<attiny84ap.kicad_pcb") or die("ha ha"); while (<BRD>) { my $new_line = $_; chomp; $text = "$text $new_line"; } $text =~ s/\s+\(/\(/g; # remove white space in front of '(' $text =~ s/\(\s+/\(/g; # remove white space after '(' $text =~ s/\s+\)/\)/g; # remove white space in front of ')' $text =~ s/\)s+/\)/g; # remove white space after ')' # This takes tooooo long silly #$text =~ m/\(kicad_pcb\(version (\d+)\)\(host pcbnew \"(.*)\"\)\(gene +ral(.*)\)\(page (\S+)\)\(layers(.*)\)\(setup(.*)\)(\(net .*\))\(net_c +lass/; #my $version = $1; #my $pcbnew_revision = $2; #my $section_general = $3; #my $page = $4; #my $section_layers = $5; #my $section_setup = $6; #my $netlist_map = $7; $text =~ m/^\(kicad_pcb\(version (\d+)\)(\(.*\))\)$/; my $version = $1; $text = $2; #print "D - $version\n"; $text =~ m/^\(host pcbnew \"(.*)\"\)(\(general.*\))$/; my $pcbnew_revision = $1; $text = $2; #print "D - $pcbnew_revision\n"; $text =~ m/^\(general(.*)\)(\(page .*\))$/; my $section_general = $1; $text = $2; $section_general =~ s/\)\(/\)\n\(/g; # put newline between ')(' #print "D - $section_general\n"; $text =~ m/^\(page (\S+)\)(\(layers.*\))$/; my $page = $1; $text = $2; #print "D - $page\n"; $text =~ m/^\(layers(\(.*\))\)(\(setup.*\))$/; my $section_layers = $1; $text = $2; $section_layers =~ s/\)\(/\)\n\(/g; # put newline between ')(' #print "D - $section_layers\n"; $text =~ m/^\(setup(.*\)\))\)(\(net .*\))$/; my $section_setup = $1; $text = $2; $section_setup =~ s/\)\(/\)\n\(/g; # put newline between ')(' #print "D - $section_setup\n"; $text =~ m/^(\(net .*\))(\(net_class.*)$/; my $netlist_map = $1; $text = $2; $netlist_map =~ s/\)\(/\)\n\(/g; # put newline between ')(' print "D - $netlist_map\n"; my $parser = Parse::RecDescent->new($net_grammar) or die "Bad grammar! +\n"; defined $parser->start($netlist_map) or die "Text doesn't match"; foreach my $KEY (keys %($DATA{"NET"})) { print "$KEY\n"; } exit;
      The file that its calling is in another thread, once I figure out how to link it here I will update this thread. Here is the output I'm getting:
      >./parse_kicad_pcb.pl D - (net 0 "") (net 1 +9V) (net 2 /CLK) (net 3 /DO) (net 4 /Data_In) (net 5 /SCL) (net 6 /SDA) (net 7 /SET_Horz) (net 8 /SET_Vert) (net 9 /~Horz_ON) (net 10 /~RESET) (net 11 /~Vert_ON) (net 12 5V_ATTINY84P) (net 13 GND) (net 14 N-000001) (net 15 N-0000018) (net 16 N-0000019) (net 17 N-000002) (net 18 N-0000021) (net 19 N-0000024) (net 20 N-0000026) (net 21 N-0000027) (net 22 N-0000028) (net 23 N-0000029) (net 24 N-000003) (net 25 N-0000030) (net 26 N-0000031) (net 27 N-0000032) (net 28 N-0000034) (net 29 N-0000036) (net 30 N-0000037) (net 31 N-0000038) (net 32 N-0000039) (net 33 N-0000040) (net 34 N-0000041) (net 35 N-0000042) (net 36 N-0000043) (net 37 N-0000044) (net 38 N-0000045) (net 39 N-0000046) (net 40 N-0000047) (net 41 N-0000048) (net 42 N-0000049) (net 43 N-0000050) (net 44 N-000009) Unknown starting rule (Parse::RecDescent::namespace000001::start) call +ed at ./parse_kicad_pcb.pl line 93.
      Any clues that might help? Working on this in 30min chunks isn't helping either, silly other work that needs attention....

        Well, that fragment doesn't compile, $net_grammar doesn't exist. so you didn't get that error message from there ;)

        Using strict and warnings will help spot lots of these problems.

        But, in general you're trying too hard, you've got to let the parser do what it's best at. It will handle all of that whitespace for you. Just (!) extend you grammar to handle all of the different sections and lets the parser do it's thing.

        Maybe you need to review your parser theory and try out some simple examples first to get the hang of how RecDecent works. It is big and complex, and I don't use it often enough to keep all of the details in my head. I always have to experiment around a bit to get it to do what I want.

        Have Kicad published a formal grammar for their file format? It's worth a look anyway.

Re: Parsing a file with parentheses to build a hash
by toolic (Bishop) on Nov 13, 2014 at 16:19 UTC
Re: Parsing a file with parentheses to build a hash
by choroba (Archbishop) on Nov 14, 2014 at 09:57 UTC
    You can use the Marpa parser:
    #!/usr/bin/perl use strict; use warnings; use Data::Dumper; use Marpa::R2; my $g = << '__G__'; lexeme default = latm => 1 :start ::= Hash :default ::= action => itself Hash ::= '(' Pairs ')' action => hash Pairs ::= Pair+ action => pairs Pair ::= '(' Key Value ')' action => pair Key ::= String Value ::= String | Pairs String ~ [^\s()]+ whitespace ~ [\s]+ :discard ~ whitespace __G__ my $grammar = 'Marpa::R2::Scanless::G'->new({ source => \$g }); my $input = '( (SECTION (Section_KEY Value1) (Another_KEY1 Value2) (KEY2 value3) (KEY3 Value4)) (NEW_SECTION (SUB_SECTION (KEY4 Value5)) (NEW_SUB_SECTION (KEY5 Value6) ) ) )'; my $recce = 'Marpa::R2::Scanless::R'->new({ grammar => $gram +mar, semantics_package => 'main +', }); $recce->read(\$input); print Dumper $recce->value; sub hash { $_[2] } sub pairs { shift; +{ map @$_, @_ } } sub pair { [ @_[2, 3] ] } sub itself { $_[1] }

    Output:

    $VAR1 = \{ 'SECTION' => { 'Section_KEY' => 'Value1', 'KEY2' => 'value3', 'Another_KEY1' => 'Value2', 'KEY3' => 'Value4' }, 'NEW_SECTION' => { 'NEW_SUB_SECTION' => { 'KEY5' => 'Value +6' }, 'SUB_SECTION' => { 'KEY4' => 'Value5' } } };
    لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
      I can see you have used this module more than once. It looks like it uses the Parse::RecDescent module. I'm going to read up on this one now. I've started to poke into the file structure now and see that it's not as cut and dry as I expected... I started to remember this as I reviewed my old code... ugggly. Anyway, I realized that I need to set up some matches for different keys because they actually have implied vars based on the key name. I'm guessing the module can handel this. I really want to learn this methodology since it seems much quicker to setup once I get my head wrapped around it. I'm going to drop a bit more of a real example.
        It looks like it uses the Parse::RecDescent module.
        No, it doesn't. It's an alternative to it.

        Also, you should consider using the <readmore> tag for the large data sample.

        لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
Re: Parsing a file with parentheses to build a hash
by Anonymous Monk on Nov 13, 2014 at 19:29 UTC

      This is more like the way I do things, for better or worse. I'm going to give the recdescent module a go, and likely start a parallel approach like the one in the String Search link. The KiCad guys are Python'ers (I still like perl more) and role there eyes when I mention perl:) But I like the software so either I start to actively contribute via C++ (only 24h in a day, 30 lines of code for every line of perl, and 4 kids = no go) or continue to hack down a different path...

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1107117]
Approved by Corion
Front-paged by toolic
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (2)
As of 2022-10-03 05:27 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    My preferred way to holiday/vacation is:











    Results (13 votes). Check out past polls.

    Notices?