in reply to Finding word either side of a word match

It seems from your questions that you are building something like a search engine. Maybe How to build a Search Engine. is of interest for you (a very good node IMHO), and maybe this article on perl.com.

If you insist on creating your index manually, here is what you could do: in the while loop create a list of all found words, and then iterate over all of them again:

# assuming you stored all matches in @words my %word_context; for (1 .. $#words-1){ push @{$word_context{$words[$_]}, [$words[$_ - 1], $words[$_ + 1]]; }

Though it might be better to store context and line number in the same data structure.

You also have to think about the first and the last word, which are special in that they don't have two words of context each. What do you want to do with them?

Replies are listed 'Best First'.
Re^2: Finding word either side of a word match
by Quicksilver (Scribe) on Mar 03, 2008 at 14:44 UTC
    Thanks for the articles which I've skimmed over to read fully later. What I'm trying to do is to create a concordance which allows the user to search a text and then find all the occurences of a word and some sample text to work out if that's the section they are looking for and where it is in the text.

    Its part of a personal project to try and create some useful textual analytical tools. Also it seemed like a good way to extend my nascent knowledge of Perl into something practical whilst learning. I'll need to think about those two words.
      If you want to display context, then there's a better solution: For each each word store the position of the word in the file (in bytes) in the DB. When you want to show the context, you just seek that position (or let's say $position - 20), and read the next few bytes.

      That way you have to keep the indexed files at hand, but you avoid storing every word thrice in the DB.

        That's a far more elegant solution :)
        What would be the best way of finding the position in bytes? Its not something that I've come across yet.