gullcatcher has asked for the wisdom of the Perl Monks concerning the following question:

Hullo!

I have been trying to avoid re-inventing the wheel on this one for a while.
I am trying to match braces or brackets on a random sequence of words / whatever
so the resulting sentence / code, will make sense as far as the punctuation
is concerned. For example, a code sequence of:

$x++; if ($x) {; $y-=$x; foreach (@x) {;};

would not make sense to the compiler, since there is a missing `}'.

Here is my effort at getting Perl to correct this error.
This is somewhat slower than I need, but the most successful
algorithm I have thought of. I kneel before the Uncarved
Block of Wisdom to beg of the collected fu and chi to
look and find a better way.

Many thanks,
gullcatcher

#!/usr/bin/perl # Quick hack to match braces on a random Part. my ( $size, $seed ) = ( @ARGV ); srand $seed; $size ||= 30; my @tokens = ( ' foreach (@a) {', ' $x = $x{ $_ }', ' }', ' }', ' if ($x>0) {', ' $x-- ', ' $_ = $x', 'unless ($x>0) {', '}'); my $gener = 0; my $str; print "Code will be generated, corrected and eval'd. Errors will cause + that sequence to be dumped to STDOUT\n"; while ( $gener++ <= 1000 ) { my @genes = map { $tokens[ rand @tokens ] } ( 1 ... $size ); # This produces a random part. # Now for the clever bit. $_ = [ @genes ]; @genes = paired( $_, '{}' ); $str = join ";", @genes; eval ( $str ) || print $str if $@; }; 1; ######### sub paired { my ($p, $pair) = @_; my @genes = @{ $p }; my ( $o, $c ) = split //, $pair; # Define a sub to count the number of open braces - closed braces # in a string. $_ = 'sub { (m/'.$o.'/og) - (m/'.$c.'/og) }'; my $sub = eval $_; my $k; my @ind; for my $i ( 0 ... $#genes ) { $_ = $genes[ $i ]; my $j = &$sub; next unless $j; $k+=$j; if ( ($k < 0) and ($j < 0) ) { splice @genes, $i, 1, ""; $k-=$j;next;}; if ( $k == 0 and $j < 0 ) { @ind=(); next; }; push @ind, [ $genes[ $i ], $i, $j, $k ]; }; # The basic process is: # first loop eliminates any closed bracket if there # aren't enough preceding open brackets. # first loop produces a list for each gene, # containing: # @ind = [ actual_text, position, -1 | 1, total ] # -1 | +1 determined by matching { or }. # so the string '$x->{NULL}' would come out as 0 # total is the sum (up to that point) of the +1 or -1. # second part: # sorts the @ind lol by: my @i = sort { ($b->[ 3 ] - $k) <=> ( $a->[ 3 ] - $k ) || $b->[ 3 ] <=> $a->[ 3 ] || $b->[ 1 ] <=> $a->[ 1 ] } @ind; # list is sorted by: # total for whole list minus total at that gene, then # total up to that gene, then # position from end. my ( $p )=( -1 ); my ( @list ); # Next loop checks where insertion is due to take place. # Can't actually do the insertion yet, since it will # balls up numbering. while ( $k ) { $p++; $j = $i[ $p ]; if ( $j->[ 2 ] > 0 ) { $list[ $j->[ 1 ] ] = +1; $k--; }; }; # This loop does the insertion. Goes from the end, to avoid changing # numbers. my $j = $#list; for $i ( reverse @list ) { $j--; next unless $i; if ( $j > 0 ) { splice @genes, $j+2, 0, $c; } else { # Does this _ever_ get called? print "Deletion Activated\n"; splice @genes, $j, 1, ""; }; }; return @genes; };

Replies are listed 'Best First'.
Re: Matching paired characters.
by chromatic (Archbishop) on Jul 19, 2001 at 23:16 UTC
    Regexp::Common has a regex to match balances parenthesis and brackets. You could use the module itself, or just grab the regex out of there. (It's $RE{balanced}{-parens}, if you're curious.)
      Many thanks for responding.

      Perhaps I didn't set up the problem in a coherent manner.
      I am working on a program which involves compiling and executing auto-generated code
      At times, this auto-generated code is random code. This is all okay, but when the code is likely to
      branch, many instances of random-code raise exceptions, simply by virtue of a missing or extra curly bracket.

      Now here's the problem: before I eval the code, how do I check it, and if necessary, correct it?

      The eval code is constructed from a list of Perl statements, which may or may not contain a { or a }.
      Before the eval, the code needs to be checked and corrected.

      I've had a look at the Regexp::Common, though I am fairly sure that it can't correct the unbalanced parentheses in
      a list of statements. Am I a spanner - am I making a square wheel - or do I need to keep with this one?

      Many thanks,
      gullcatcher