Re^3: Finding word either side of a word match

If you want to display context, then there's a better solution: For each each word store the position of the word in the file (in bytes) in the DB. When you want to show the context, you just seek that position (or let's say $position - 20), and read the next few bytes.

That way you have to keep the indexed files at hand, but you avoid storing every word thrice in the DB.

Comment on Re^3: Finding word either side of a word match

Replies are listed 'Best First'.
Re^4: Finding word either side of a word match by Quicksilver (Scribe) on Mar 03, 2008 at 15:43 UTC
That's a far more elegant solution :) What would be the best way of finding the position in bytes? Its not something that I've come across yet.	[reply]
Re^5: Finding word either side of a word match by moritz (Cardinal) on Mar 04, 2008 at 08:08 UTC
You can slurp the whole file into meory like this: `open (my $handle, '<', $file) or die "Can't read '$file': $!"; my $contents = do { local $/; <$file> };` [download] And then when you match against that string, you can query `pos $contents` to get the position of the match, which is the same as the position in bytes. (Note that you will run into troubles with multi byte encodings this way). Another way is to read the file line by line, and track the number of characters that have been consumed so far: `my $pos = 0; while (<$handle>){ my $line_len = length $_; # do that before chomping chomp; while (m/(\w+)/g){ my $word = $1; my $word_pos = $pos + pos; } $pos += $line_len; }` [download] See pos.	[reply] [d/l] [select]