in reply to extracting search term context

Plucene and KinoSearch create indexes (giant hash), so if you lookup match , in return you get a list of files which contain match.

Then you iterate over the list of files and call extract($file,'match',10)

extract() then uses sliding window technique to perform a linear search, that is, you read the file word by word (or line by line, and then split into words), while keeping a buffer of the previous N words until you find your match, then you print N words from the previous buffer, and then read /print additional N words.

Replies are listed 'Best First'.
Re^2: extracting search term context
by Anonymous Monk on Jun 09, 2010 at 04:13 UTC
    Ideally you would cache these search results, perhaps invalidating the cache when you add/index more files.