in reply to KinoSearch - is there a way to iterate over all documents in an index?

Something here doesn't add up - you want to iterate through all the documents in an index, but you can't use the most obvious method because it's too slow? What makes you think this operation can be performed in a way that won't be slow? Data structures in general tend to be optimized for either random-access (think hashes) or iteration (think arrays). I bet KinoSearch is more like the former than the latter!

-sam

  • Comment on Re: KinoSearch - is there a way to iterate over all documents in an index?

Replies are listed 'Best First'.
Re^2: KinoSearch - is there a way to iterate over all documents in an index?
by isync (Hermit) on Sep 12, 2007 at 11:23 UTC
    In the inverted index, there is a certain order of documents and in a results set, there is another one.

    Now, iterating over the index in the order of a results set means mapping the the order-of-relevance (from the results set) to the order-in-the-index (as sorted by KinoSearch). That involves quite a lot of repositioning of the read pointer (disk seeks) and slows down.

    Optimally, I would read doc after doc like they are stored in the invindex and not by an arbitrary order I get via a results-subset. That would cut the seeks and speed up.
    But as it seems nobody here (including me) knows how to access KinoSearch inverted indexes directly to interate over the docs in the order they are stored in the index.