Matching paired characters.

gullcatcher has asked for the wisdom of the Perl Monks concerning the following question:

Hullo!

I have been trying to avoid re-inventing the wheel on this one for a while.
I am trying to match braces or brackets on a random sequence of words / whatever
so the resulting sentence / code, will make sense as far as the punctuation
is concerned. For example, a code sequence of:

$x++; if ($x) {; $y-=$x; foreach (@x) {;};

would not make sense to the compiler, since there is a missing `}'.

Here is my effort at getting Perl to correct this error.
This is somewhat slower than I need, but the most successful
algorithm I have thought of. I kneel before the Uncarved
Block of Wisdom to beg of the collected fu and chi to
look and find a better way.

Many thanks,
gullcatcher

#!/usr/bin/perl
# Quick hack to match braces on a random Part.

my ( $size, $seed ) = ( @ARGV );
srand $seed;
$size ||= 30;

my @tokens = ( ' foreach (@a) {', ' $x = $x{ $_ }', ' }', ' }',
 ' if ($x>0) {', ' $x--   ', ' $_ = $x',
 'unless ($x>0) {', '}');

my $gener = 0;
my $str;
print "Code will be generated, corrected and eval'd. Errors will cause
+ that sequence to be dumped to STDOUT\n";
while ( $gener++ <= 1000 ) {
 my @genes = map { $tokens[ rand @tokens ] } ( 1 ... $size );

 # This produces a random part.

 # Now for the clever bit.
 $_ = [ @genes ];
 @genes = paired( $_, '{}' );
 $str = join ";", @genes;
 eval ( $str ) || print $str if $@;
};

1;
#########

sub paired {
 my ($p, $pair) = @_;
 my @genes = @{ $p };

 my ( $o, $c ) = split //, $pair;

 # Define a sub to count the number of open braces - closed braces
 # in a string.

 $_ = 'sub { (m/'.$o.'/og) - (m/'.$c.'/og) }';
 my $sub = eval $_;

 my $k;
 my @ind;
 for my $i ( 0 ... $#genes ) {
  $_ = $genes[ $i ];

  my $j = &$sub;

  next unless $j;
  $k+=$j;
  if ( ($k < 0) and ($j < 0) ) {
   splice @genes, $i, 1, ""; $k-=$j;next;};
  if ( $k == 0 and $j < 0 ) { @ind=(); next; };
  push @ind, [ $genes[ $i ], $i, $j, $k ];

 };

 # The basic process is:
 #  first loop eliminates any closed bracket if there
 # aren't enough preceding open brackets.
 # first loop produces a list for each gene,
 # containing:
 # @ind = [ actual_text, position, -1 | 1, total ]
 #  -1 | +1 determined by matching { or }.
 #  so the string '$x->{NULL}' would come out as 0

 # total is the sum (up to that point) of the +1 or -1.

 # second part:
 # sorts the @ind lol by:
 my @i = sort {
   ($b->[ 3 ] - $k) <=> ( $a->[ 3 ] - $k )
    ||
   $b->[ 3 ] <=> $a->[ 3 ]
    ||
   $b->[ 1 ] <=> $a->[ 1 ]
     }

   @ind;
 # list is sorted by:
 # total for whole list minus total at that gene, then
 # total up to that gene, then
 # position from end.

 my ( $p )=( -1 );
 my ( @list );


 # Next loop checks where insertion is due to take place.
 # Can't actually do the insertion yet, since it will
 # balls up numbering.

 while ( $k ) {

  $p++;

  $j = $i[ $p ];

  if ( $j->[ 2 ] > 0 ) {

   $list[ $j->[ 1 ] ] = +1; $k--;

  };

 };


 # This loop does the insertion. Goes from the end, to avoid changing
 # numbers.

 my $j = $#list;

 for $i ( reverse @list ) {
  $j--;
  next unless $i;

  if ( $j > 0 ) {
   splice @genes, $j+2, 0, $c;

  } else {
   # Does this _ever_ get called?
   print "Deletion Activated\n";
   splice @genes, $j, 1, "";
  };
 };
 return @genes;
 };
[download]

Comment on Matching paired characters. Download Code

Replies are listed 'Best First'.
Re: Matching paired characters. by chromatic (Archbishop) on Jul 19, 2001 at 23:16 UTC
Regexp::Common has a regex to match balances parenthesis and brackets. You could use the module itself, or just grab the regex out of there. (It's `$RE{balanced}{-parens}`, if you're curious.)	[reply] [d/l]
Re: Re: Matching paired characters. by gullcatcher (Initiate) on Jul 21, 2001 at 02:27 UTC
Many thanks for responding. Perhaps I didn't set up the problem in a coherent manner. I am working on a program which involves compiling and executing auto-generated code At times, this auto-generated code is random code. This is all okay, but when the code is likely to branch, many instances of random-code raise exceptions, simply by virtue of a missing or extra curly bracket. Now here's the problem: before I `eval` the code, how do I check it, and if necessary, correct it? The eval code is constructed from a list of Perl statements, which may or may not contain a { or a }. Before the eval, the code needs to be checked and corrected. I've had a look at the Regexp::Common, though I am fairly sure that it can't correct the unbalanced parentheses in a list of statements. Am I a spanner - am I making a square wheel - or do I need to keep with this one? Many thanks, gullcatcher	[reply] [d/l]