Re: Finding word either side of a word match

Updated to use less memory, and make it easier to do some Markov chain analysis should you so desire. Tweaked regex for some corner cases.

You may just want to build your index and get it over with. Note: this will use, as a wild guess, at least 3 times as much memory as the file size, so be careful with large files.

use warnings;
use strict;

my %index;

while ( my $line = <DATA> ) {
    $line = lc $line;
    $line =~ s/^\P{Alnum}+|\P{Alnum}+$//g;
    my @words = split /\P{Alnum}*\s\P{Alnum}*/, $line;
    for ( 0 .. $#words ) {
        my $word = $words[$_];
        $index{$word}{count}++;
        my ( $pre, $post ) = ( '', '' );
        if ( $_ > 0 ) {
            $pre = $words[ $_ - 1 ];
        }
        if ( $_ < $#words ) {
            $post = $words[ $_ + 1 ];
        }
        push @{ $index{$word}{lines} }, [ $., $pre, $post ];
    }
}

for my $word ( sort keys %index ) {
    print "$word - $index{$word}{count} time"
      . ( $index{$word}{count} == 1 ? '' : 's' ) . ":\n";
    printf "    Line %4d - %s $word %s\n", @$_ for ( @{ $index{$word}{
+lines} } );
    print "\n";
}

__DATA__
Mary had a little lamb,
A little pork, a little jam,
A little fish, some kangaroo,
A pudding and some cookies too,
An ice cream soda topped with fizz,
And boy how sick our Mary is.

Mary had a little lamb,
Her daddy shot it dead.
And now it goes to school with her,
Between two hunks of bread.
[download]

Comment on Re: Finding word either side of a word match Download Code