Why not create a dictionary of words with lists of offsets. Store the byte-location of the start of the line so then when you want to retrieve you seek immediately to the right location and print that line. Another idea is to store the offset of the n-th previous line so you can print some context.
I've written a function for you that accepts a filename and an optional number of lines of context ( the default being 1. You'll probably want to store the index somewhere using Storable so its convenient to re-use your index for later.
my @files = glob "*.txt"; my %file_idx = map {; $_ => index_file( $_, 5 ) } @files; =pod { 'foobar.txt' => { word => [ 1, 3, 5, 6 ], another => [ 5, 7, 2, ] }, 'barfoo.txt' => { ....... } =cut sub index_file { my $filename = shift; my $lines_of_context = $_[0] > 0 ? shift() : 1; open my $fh, "<", $filename or die "Couldn't open $filename: $!"; my @offsets; my %index; while ( my $line = <$fh> ) { push @offsets, tell $fh; my $offset = scalar( @offsets ) < $lines_of_context ? $offsets[0] : shift @offsets; for my $word ( split ' ', $line ) { push @{ $index{$word} }, $offset; } } close $fh or warn "Couldn't close $filename: $!"; return \ %index; }
In reply to Re: Re: Re: Simple Text Indexing
by diotalevi
in thread Simple Text Indexing
by cyocum
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |