dudi has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to write a script that automatically goes through c code and checks for unclosed brackets and other such stuff. I would like to be able to count the brackets in the programs. I tried writing something like:
#!/usr/bin/perl -w open (LOG, ">>threadcheck.log") || &Error($!); $now = localtime; # "Thu Oct 13 04:54:34 1994" print LOG "Log Created on $now \r\n"; my $lines = 0; foreach $FILE (<*.c *.C *.cpp *.CPP *.h *.H>) { open (FIL, ">>$FILE") || die $!; print LOG "File $FILE was opened on $now \r\n"; while (<FIL>) { $lines++; my $brackets; foreach $word ( split ) { if ($word =~ '{' ) { print LOG "New open bracket \r\n"; $brackets++; } if ($word =~ '}' ) { print LOG "New closed bracket \r\n"; $brackets--; } print LOG "Number of unclosed brackets in file $FILE: $brackets\r\n"; } print LOG "Number of lines in file $FILE: $lines\r\n"; } close FIL } close LOG #####
The problem is, the LOG gets written with "File $FILE was opened on $now \r\n", but nothing else! I've tried writing things to each file on the order of:
while (<FIL>) { print FIL "/* I opened you on $now for bracket check*/ ... }
and it worked fine. Do you have any clue what I am doing wrong?

Replies are listed 'Best First'.
Re: finding { and }
by higle (Chaplain) on Oct 17, 2001 at 00:51 UTC
    There are two problems with your script:

    One (the main problem), when you are opening the file that you wish to read from, i.e. open (FIL, ">>$FILE") || die $!;, you are using the append operator (>>) on the $FILE, instead of the read (<) operator. This will not allow you to read in the file line by line, in fact it will not allow you to read the file at all.

    Two, as already pointed out, the regexes are problematic. You're using single quotes as delimiters, instead of the standard forward slash (which is allowable), but to do so you have to put m in front of the first single quote, i.e. $word =~ m'}' to let Perl know that you're going to use something other than the forward slash. Or you could just use a forward slash, $word =~ /}/. And you have to put a backslash in front of the "{", because it's a quantifying metacharacter.

    Here is a rewrite of your script that should work:
    #!/usr/bin/perl -w open (LOG, ">>threadcheck.log") || &Error($!); $now = localtime; print LOG "Log Created on $now \r\n"; my $lines = 0; foreach $FILE (<*.c *.C *.cpp *.CPP *.h *.H>) { open (FIL, "<$FILE") || die $!; #changed ">>" to "<" print LOG "File $FILE was opened on $now \r\n"; while (<FIL>) { $lines++; my $brackets; foreach $word ( split ) { if ($word =~ /\{/ ) #or if ($word =~ m'\{' ) { print LOG "New open bracket \r\n"; $brackets++; } if ($word =~ /}/ ) #or if ($word =~ m'}' ) { print LOG "New closed bracket \r\n"; $brackets--; } print LOG "Number of unclosed brackets in file $FILE: $brackets\ +r\n"; } print LOG "Number of lines in file $FILE: $lines\r\n"; } close FIL } close LOG

    higle
      wow, thanks a lot for your help guys!

      Some small remarks to your point 2:

      • Using the syntax $text =~ "abc" works perfectly fine, you don't need the m// operator (unless you want to give modifiers like /i). (see perlop) print "match" if "match" =~ 'a.c';
      • You don't have to escape the { in this regex, only if it might be mistaken for the meta character then you have to escape {. Even this is OK: print "match" if 'a{2,-1}b' =~ 'a{2,-1}';

      In general - as tommyw mentioned together with other important aspects - I'd use the index function for this kind of searching for a fixed string.

      -- Hofmator

Re: finding { and }
by drinkd (Pilgrim) on Oct 16, 2001 at 23:54 UTC
    The left curley brace is a metacharacter and needs to be backslashed in regexes, I believe. drinkd
Re: finding { and }
by tommyw (Hermit) on Oct 17, 2001 at 01:42 UTC

    Your problem's been addressed. But... rather than split-ing the string, and examining individual characters, why not use index? That removes the problem with regexps and brackets, as well as being faster.

    If $brackets ever goes below 0, you've got a problem, even if the final total is ok: consider )(. And you're not considering brackets inside strings:printf("(");

    Of course, the logical extension is to consider the number of round brackets inside each pair of curly brackets. At this rate you're going to end up with a full blown syntax checker. Why not hand the whole thing over to the compiler?

      Or if you still insist on doing it yourself, you might want to use Text::Balanced which does exactly what you need in ways you probably haven't even thought of yet. Or even Parse::RecDescent if you can handle it, but in that case you're probably getting so close to doing your compiler's work that you really should hand the files over to it instead.
Re: finding { and }
by Fletch (Bishop) on Oct 17, 2001 at 06:28 UTC

    Possibly not directly applicable in this case, but check out Text::Balanced for matching balanced delimiters and Regexp::Common which will create a regexp for simpler cases.

    Oh yeah, Damian++.

Re: finding { and }
by hopes (Friar) on Oct 17, 2001 at 00:09 UTC
    UPDATE! Oops!! I were wrong...

    I didn't take care of
    $a=~'a' is correct
    I thought that it would be
    $a=~m'a' instead
    But
    perl -e "$a='b}b';print 'Mached!' if $a=~'}'"
    runs as expected.

    It's not necessary the backslash in '{'
    Update In response to drinkd post, I've to say that both
    perl -e "$a='b}b';print 'Mached!' if $a=~'}'" perl -e "$a='b}b';print 'Mached!' if $a=~'}'"
    are correct. I don't have to write backlashes.
    Hopes
      Only the left curly brace is a metacharacter. drinkd
Re: finding { and }
by stefan k (Curate) on Oct 17, 2001 at 15:14 UTC
    Hi,
    are you sure that this will really solve your problem? What about curly braces in comments?
    I don't think I'd reinvent the wheel this way. I'd rather try to get some already more sophisticated machine setup to fullfill my wishes. Maybe one could use the programming mode of his/her favorite editor, which probably already does some checking like this. I could also imagine that indent and gcc or some TAGs creating programms might be helpfull. Or in XEmacs you might want to call M-x occur RET [{}] (I had rather used a tt-font here, but I needed to use code-Tags to get the brackets) or maybe M-x count-matches RET { and then compare to the count of }.

    I know that this doesn't quite answer your question, which has been adressed above. It just shall save you from further problems...

    Regards... Stefan
    you begin bashing the string with a +42 regexp of confusion

      (I had rather used a tt-font here, but I needed to use code-Tags to get the brackets)

      What about [this] :-) ??

      Produced with &#091; and &#093;

      -- Hofmator

        Thanks!
        I even searched HTMLlat1.ent from the HTML 4.0 DTD distribution but to no results. (Even now, knowing the solution would have been 091, I can't find'em there)
        I pinned them to the monitor as yet another post-it note (YAPN) ;-)

        Regards... Stefan
        you begin bashing the string with a +42 regexp of confusion