in reply to Re: Re: Module namespace: OpenDirectory something?
in thread Module namespace: OpenDirectory something?

This sounds very similar to my work, although I used the Glimpse full-text search engine instead of DB_Files to store the text index. This allows for much faster searches (0.1 to 1 seconds in searches that match less than 10% of all records). Glimpse also supports partial word matching, stemming and sports a broken regex implementation (woopee!). My system also supported boolean operators in complex searches including a peephole optimizer driven by the results of past searches.

I remember that my biggest problems in the project were that the .u8 files aren't really in utf-8 - there's a ton of eastern-euro 8-bit encodings in there. So I had to do a number of passes just to clean the data. Second, I had to return complex result documents which meant building an index on the result elements as well as the source elements.

End result: the company went out of business without ever managing to sell their product. My code went straight to /dev/null, as far as I know!

-sam

  • Comment on Re: Re: Re: Module namespace: OpenDirectory something?