in reply to Finding word either side of a word match
Updated to use less memory, and make it easier to do some Markov chain analysis should you so desire. Tweaked regex for some corner cases.
You may just want to build your index and get it over with. Note: this will use, as a wild guess, at least 3 times as much memory as the file size, so be careful with large files.
use warnings; use strict; my %index; while ( my $line = <DATA> ) { $line = lc $line; $line =~ s/^\P{Alnum}+|\P{Alnum}+$//g; my @words = split /\P{Alnum}*\s\P{Alnum}*/, $line; for ( 0 .. $#words ) { my $word = $words[$_]; $index{$word}{count}++; my ( $pre, $post ) = ( '', '' ); if ( $_ > 0 ) { $pre = $words[ $_ - 1 ]; } if ( $_ < $#words ) { $post = $words[ $_ + 1 ]; } push @{ $index{$word}{lines} }, [ $., $pre, $post ]; } } for my $word ( sort keys %index ) { print "$word - $index{$word}{count} time" . ( $index{$word}{count} == 1 ? '' : 's' ) . ":\n"; printf " Line %4d - %s $word %s\n", @$_ for ( @{ $index{$word}{ +lines} } ); print "\n"; } __DATA__ Mary had a little lamb, A little pork, a little jam, A little fish, some kangaroo, A pudding and some cookies too, An ice cream soda topped with fizz, And boy how sick our Mary is. Mary had a little lamb, Her daddy shot it dead. And now it goes to school with her, Between two hunks of bread.
|
|---|