in reply to Re: Re: Simple Text Indexing
in thread Simple Text Indexing
Why not create a dictionary of words with lists of offsets. Store the byte-location of the start of the line so then when you want to retrieve you seek immediately to the right location and print that line. Another idea is to store the offset of the n-th previous line so you can print some context.
I've written a function for you that accepts a filename and an optional number of lines of context ( the default being 1. You'll probably want to store the index somewhere using Storable so its convenient to re-use your index for later.
my @files = glob "*.txt"; my %file_idx = map {; $_ => index_file( $_, 5 ) } @files; =pod { 'foobar.txt' => { word => [ 1, 3, 5, 6 ], another => [ 5, 7, 2, ] }, 'barfoo.txt' => { ....... } =cut sub index_file { my $filename = shift; my $lines_of_context = $_[0] > 0 ? shift() : 1; open my $fh, "<", $filename or die "Couldn't open $filename: $!"; my @offsets; my %index; while ( my $line = <$fh> ) { push @offsets, tell $fh; my $offset = scalar( @offsets ) < $lines_of_context ? $offsets[0] : shift @offsets; for my $word ( split ' ', $line ) { push @{ $index{$word} }, $offset; } } close $fh or warn "Couldn't close $filename: $!"; return \ %index; }
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Re: Re: Re: Simple Text Indexing
by cyocum (Curate) on Nov 30, 2003 at 12:02 UTC | |
by ysth (Canon) on Nov 30, 2003 at 12:10 UTC | |
by cyocum (Curate) on Nov 30, 2003 at 16:46 UTC | |
by diotalevi (Canon) on Dec 08, 2003 at 21:05 UTC |