in reply to Re^2: Finding word either side of a word match
in thread Finding word either side of a word match

If you want to display context, then there's a better solution: For each each word store the position of the word in the file (in bytes) in the DB. When you want to show the context, you just seek that position (or let's say $position - 20), and read the next few bytes.

That way you have to keep the indexed files at hand, but you avoid storing every word thrice in the DB.

  • Comment on Re^3: Finding word either side of a word match

Replies are listed 'Best First'.
Re^4: Finding word either side of a word match
by Quicksilver (Scribe) on Mar 03, 2008 at 15:43 UTC
    That's a far more elegant solution :)
    What would be the best way of finding the position in bytes? Its not something that I've come across yet.
      You can slurp the whole file into meory like this:
      open (my $handle, '<', $file) or die "Can't read '$file': $!"; my $contents = do { local $/; <$file> };

      And then when you match against that string, you can query pos $contents to get the position of the match, which is the same as the position in bytes. (Note that you will run into troubles with multi byte encodings this way).

      Another way is to read the file line by line, and track the number of characters that have been consumed so far:

      my $pos = 0; while (<$handle>){ my $line_len = length $_; # do that before chomping chomp; while (m/(\w+)/g){ my $word = $1; my $word_pos = $pos + pos; } $pos += $line_len; }

      See pos.