The SPARC assembler, called isem, that I need to use for some of my classes does an absolutely terrible job at reporting the use of undeclared symbols in code. As you might expect, this is very annoying, since this will allow a buggy program will assemble just fine, causing wacky and hard to trace errors to appear. For instance, there might be a string-printing routine named "pr_str"; however, you might accidentally type call p_str. Instead of getting a core dump, the call will actually jump to some seemingly random location and continue running. Hurrah.
As a solution to this problem, I wrote up a small tool that I've named (for lack of anything better) isem_checker. I find it pretty handy, at any rate.
#!/usr/local/bin/perl
use strict;
use warnings 'all';
use Getopt::Long;
use Pod::Usage;
Getopt::Long::Configure ("bundling");
my %OPT;
our $VERSION = '2.00';
# setup stuff
GetOptions ('unused|u' => \$OPT{unused},
'no_nop|n' => \$OPT{nonop},
'usage|help|h' => \$OPT{usage},
'man|m' => \$OPT{man},
'version|v' => \$OPT{version}
) or pod2usage(2);
if ($OPT{version}) {
print STDERR <<"__VERSION__";
This is isem_checker, a syntax-checking extension to isem.
Version: $VERSION
Author: Joseph F. Ryan <ryan.311\@osu.edu>
__VERSION__
exit(0);
}
unless (@ARGV == 1) {
print "\nYou must specify one filename.\n";
pod2usage(1);
}
pod2usage(2) if $OPT{usage};
pod2usage(1) if $OPT{man};
my ($created, $used) = find_symbols($ARGV[0], \%OPT);
# Check to see if there are any symbols that
# were used but not created.
$" = ', ';
my $found;
# Loop through the symbols, and report any that don't exist.
while (my ($k,$v) = each %$used) {
if ($k && ($k !~ /^\d+/)) {
unless (exists $created->{$k}) {
print "\n\n";
print "Unknown symbol '$k' found at lines: @$v.\n";
print "Possible matches include: \n";
print possible_matches($k, $created);
++$found;
}
}
}
print "\n\n";
if ($OPT{unused}) {
while (my ($k, $v) = each %$created) {
unless (exists $used->{$k}) {
print "Symbol '$k' created but not used at line $v.\n";
}
}
}
print "\n\nSyntax for $ARGV[0] is correct\n" unless $found;
=head2 possible_matches (symbol, symbol hash)
Find possible matches for a given symbol.
Possible matches are defined as: symbols that start with the same
+letter,
and also contain 3 of the same letters total. Matches are made ca
+se-
insensitive. I know this isn't the greatest, but hey, if you can
+think
of a better way, let me know :)
=cut
sub possible_matches {
my ($sym, $possible) = @_;
my @letters = split //, lc $sym;
my $ret = '';
while (my ($k,$v) = each %$possible) {
my $symbol = lc $k;
my $match = 0;
foreach (@letters) {
++$match if index($symbol,$_) >= 0;
}
$ret .= "\t'$k', created at line $v.\n"
if $match > 3 && index($symbol,$letters[0]) == 0;
}
return $ret;
}
=head2 find_symbols (filename, options hash)
C<find_symbols> takes one filename and the options hash as argumen
+ts, and
returns 2 hashes of symbols: created symbols and used symbols. Fo
+r the
created symbols hash, the key is the symbol name and the value is
+the line
that the symbol was created on. For the used symbols hash, the ke
+y is the
symbol name and the value is a list of all lines that the symbol w
+as used
on.
=cut
sub find_symbols {
my ($file,$OPT) = @_;
my (%created_symbols, %used_symbols);
open (SOURCE, $file) or die "Can't open file: $!";
my $i=0; # line number
# Statement patterns.
my $label = qr/(\w+):/;
my $constant = qr/\.(?:set|global)\s+(\w+),/;
my $branch = qr/(?:b[^cst]\w{0,2}|call) (\w+)/;
my $sinst = qr/\w+\s+(\%?\w+),\s+(\%?\w+)/;
my $linst = qr/\w+\s+\%\w+,\s+(\w+),\s+\%\w+/;
# read in source and make a list of all created symbols
# and used symbols
my $nop;
MAIN: while (<SOURCE>) {
++$i;
# clean out comments and leading whitespace
s/!.*//;
s/^\s*//;
s/\s*$//;
next unless $_;
if ($nop && (!$OPT->{nonop})) {
print "Expected nop at line $i after call to $nop.\n" unle
+ss /^nop$/
}
$nop = '';
# labels
m/$label/ and do {
print "Symbol $1 redefined at line $i\n"
if exists $created_symbols{$1};
$created_symbols{$1} = $i;
next MAIN;
};
# symbolic constants
m/$constant/ and do {
print "Symbol $1 redefined at line $i\n"
if exists $created_symbols{$1};
$created_symbols{$1} = $i;
next MAIN;
};
# branch instruction
m/$branch/ and do {
push @{$used_symbols{$1}}, $i;
$nop = $1;
next MAIN;
};
# 2 item instructions
# Constants can appear on either the left
# or the right side, so more processing is
# needed.
m/$sinst/ and do {
my $x;
my ($o,$t) = ($1,$2);
if ($o && ($o !~ /\%/)) {
$x = $o;
}
elsif ($t && ($t !~ /\%/)) {
$x = $t;
}
else {
next MAIN;
}
push @{$used_symbols{$x}}, $i;
next MAIN;
};
# 3 item instructions
m/$linst/ and do {
push @{$used_symbols{$1}}, $i;
next MAIN;
};
}
close (SOURCE);
return (\%created_symbols, \%used_symbols);
}
__END__
=head1 NAME
isem_checker
=head1 SYNOPSIS
isem_checker filename [options]
Options:
--unused report any created yet unused symbols
--no_nop Surpress warning about missing nop's
--usage display usage
--man full documentation
--version version info
=head1 DESCRIPTION
C<isem_checker> is a program that checks SPARC assembly language progr
+ams for
undefined symbols and labels, since the ISEM assembler obviously doesn
+'t do
it.
=head1 OPTIONS
=over 8
=item B<--unused>
If either the C<--unused> or C<-u> flag is set, then C<isem_checker> w
+ill
report any created yet unused symbols in the source text.
=item B<--no_nop>
If either the C<--no_nop> or C<-n> flag is set, then C<isem_checker> w
+ill
surpress warnings about branch/call statements that are missing a trai
+ling
C<nop> statement.
=item B<--usage>
Displays command-line information and exits.
=item B<--man>
Prints this entire manual page and exits.
=item B<--version>
Displays version information and exits.
=back
=head1 AUTHOR
Joseph F. Ryan <ryan.311@osu.edu>
=cut
Update: I added a few commandline options and documentation.