comment on

When I was in grad school, I had to refer to concordances a lot, especially Shakespeare and Chaucer. I was trying to find a particular line in a Shakespeare play (Othello, to be exact) the other day and I thought that it would be an entertaining programming exercise to write a concordance generator. Just pass the code a text file and it will generate a full concordance, listing the number of times each word appears in the text, as well as the line numbers, or you can pass it a specific word and a text file, and it will return the line(s) that contain that word.

Now before everyone starts asking why I didn't use strict, my answer is that I did up until the moment I tried to use Getopt::Std. Obviously I'm missing something, but in order to pass strict, I had to declare my $opt variables. But when I did that, it ignored my command line flags. Any help in that regard would be greatly appreciated.

Update: Modified code. Still tweaking..... (btw, the line in Othello I was looking for was the line about throwing away a pearl worth more than the whole tribe. I don't remember why I was looking it up now, but it seemed important at the time.)

#!/usr/bin/perl

#--------------------------------------------------------------------#
# Concordance Generator
#       Date Written:   13-Aug-2001 04:02:11 PM
#       Last Modified:  14-Aug-2001 04:14:00 PM
#       Author:         Kurt Kincaid
#
#         This is free software and may be distributed under the
#         same terms as Perl itself.
#
#  A simple concordance generator, particularly useful for linguistic
#  analysis.
#--------------------------------------------------------------------#

use strict;
use vars qw($opt_h $opt_s);
use Getopt::Std;

my @theseWords;
my @theseLines;
my @found;
my %Count;
my %Line;
my ( $line, $word, $count, $LineNum );
my $VERSION = "1.0";

getopts( "hs:" );

if ( $opt_h ) {
    Usage();
}

my $file = shift || Usage();

open ( IN, $file ) || die "$file not found\n";
@theseLines = <IN>;
close (IN);
chomp @theseLines;

if ( $opt_s ) {
    Word($opt_s);
}

foreach $line ( @theseLines ) {
    $count++;
    $line = lc $line;
    $line =~ s/[.,:;?!]//g;
    while ( $line =~ /\b\w+\b/g ) {
        $word = $&;
        if ( $word =~ /\s/ || $word eq "" ) { next }
        $Count{$word}++;
        if ( defined $Line{$word} ) {
            $Line{$word} =~ m/(\d*?)$/;
            if ( $1 == $count ) {
                next;
            } else {
                $Line{$word} .= ", $count";
            }
        } else {
            $Line{$word} = $count;
        }
#        push @{$Line{$word}}, $count unless exists $Line{$word} && $L
+ine{$word}[-1] == $count;
    }
}

@theseWords = keys %Count;
@theseWords = sort @theseWords;
foreach $word ( @theseWords ) {
#    print ( "$word ($Count{$word}): ", join ', ', @{$Line{$word}}, "\
+n\n" );
    print ("$word ($Count{$word}): $Line{$word}\n\n");
}

sub Word {
    my $word = shift;
    foreach $line ( @theseLines ) {
        $LineNum++;
        $Line{$line} = $LineNum;
    }

    @found = grep { /$word/i } @theseLines;

    foreach $line ( @found ) {
        print ("$Line{$line}: $line\n");
    }
    exit;
}

sub Usage {
    print <<END;
Concordance Generator v$VERSION
    $0 [-h] [-s word] filename
    -h  Print this screen.
    -s  Perform a search for a specific word with immediate context.

END
    exit;
}
[download]

In reply to Concordance Generator by sifukurt

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


Clear questions and runnable code get the best and fastest answer
	PerlMonks