Re: Parsing balanced parentheses

Seems there are a couple of ways of going about this depending on how much of the whole parsing problem this is. One way is to hand roll a recursive decent parser. Another way is to use Parse::RecDescent to generate the parser for you. If the data you have shown is representative of the worst case problem then hand rolling the parser is probably a reasonable approach. If it gets much more complex than that then using the module to generate a parser is probably a good idea.

You might like to tell us something of the larger picture in case that hints at better ways to solve the larger problem rather than the fine focus on the parsing problem. The best approach to solving the parsing issue may well depend on what you want to do with the parsed result.

Update: as an exercise I wrote the following code that may be a starting point for rolling parser:

use strict;
use warnings;
use Data::Dump::Streamer;

my $contents = <<'BLOCK';
create item "xxx" {
  remove entry "xxx"
  add entry "yy" "xxx" {
     item1=xxxx,
     item2=xxxx,
  }
  add diffenent entry "xxx" {
     item1=xxx,
     item2=xxxx,
     item3=xxx
     add subitem {
         item1=xxx,
         item2=xxx,
         item3=xxx
      }
      add subitem {
         item1=xxx,
         item2=xxx,
      }
   }
   another type {
        item1=xxx
   } with "xxx"
 }
BLOCK

my @tokens = split /\s+/, $contents;
my @chunks;

push @chunks, ExtractChunks (\@tokens);
Dump (\@chunks);


sub ExtractChunks {
    my $tokens = shift;
    my @chunks;
    
    while (@$tokens) {
        my $token = shift @$tokens;
        
        if ($token =~ /{/) {
            push @chunks, ExtractChunks ($tokens);
        } elsif ($token =~ /}/) {
            last;
        } else {
            push @chunks, $token;
        }
    }
    
    return \@chunks;
}
[download]

Prints:

$ARRAY1 = [ [
            'create',
            'item',
            '"xxx"',
            [
              'remove',
              'entry',
              '"xxx"',
              'add',
              'entry',
              '"yy"',
              '"xxx"',
              [
                'item1=xxxx,',
                'item2=xxxx,'
              ],
              'add',
              'diffenent',
              'entry',
              '"xxx"',
              [
                'item1=xxx,',
                'item2=xxxx,',
                'item3=xxx',
                'add',
                'subitem',
                [
                  'item1=xxx,',
                  'item2=xxx,',
                  'item3=xxx'
                ],
                'add',
                'subitem',
                [
                  'item1=xxx,',
                  'item2=xxx,'
                ]
              ],
              'another',
              'type',
              [ 'item1=xxx' ],
              'with',
              '"xxx"'
            ]
          ] ];
[download]

DWIM is Perl's answer to Gödel

Comment on Re: Parsing balanced parentheses Select or Download Code

Replies are listed 'Best First'.
Re^2: Parsing balanced parentheses by budman (Sexton) on Aug 03, 2006 at 07:28 UTC
Thanks for your helpful suggestions. I'll look over all the points of interest. Sorry I haven't replied sooner, been busy with a new release and other issues. Unfortunately, the above is an active script file used by a few heavy duty C++ apps. I was asked to help improve the auditing system that generates reports from these script files. (it enables users to provide adjustments to the current data run) To get the script working, it currently uses the old match and set a flag :) to update a report. I implemented the updated code, which works descent - generates a more detailed spreadsheet capturing changes made in the last 5 years or so. :) It was one of those, when we get a chance... unfortunately, it took some auditors to get the ball rolling. All is well. The reason I asked, I know there has to be a better approach to the whole process. I would like to load it into a hash and eliminate the nested if's. At least it doesn't look like a rat nest now. I will look over the faqs and other suggested items. Thanks Again, Regards, budman	[reply]

Replies are listed 'Best First'.

Re^2: Parsing balanced parentheses
by budman (Sexton) on Aug 03, 2006 at 07:28 UTC

Thanks for your helpful suggestions. I'll look over all the points of interest.

Sorry I haven't replied sooner, been busy with a new release and other issues. Unfortunately, the above is an active script file used by a few heavy duty C++ apps. I was asked to help improve the auditing system that generates reports from these script files. (it enables users to provide adjustments to the current data run)

To get the script working, it currently uses the old match and set a flag :) to update a report. I implemented the updated code, which works descent - generates a more detailed spreadsheet capturing changes made in the last 5 years or so. :) It was one of those, when we get a chance... unfortunately, it took some auditors to get the ball rolling. All is well.

The reason I asked, I know there has to be a better approach to the whole process. I would like to load it into a hash and eliminate the nested if's. At least it doesn't look like a rat nest now.

I will look over the faqs and other suggested items.

Thanks Again, Regards, budman

[reply]