Hi tachyon-II,
First of all i have a question to you...
Have you ever try to do these things that you say in practice???
Have you ever try to do the intersection that you propose to me in a
table with 2000000000 records???
If you did it and the perfomance was sutisfied , i cant say anything... but because before end up in this shema i spend
4 months tried all these shema that you said with MySQL and the results wasnt sutisfied...(i read all the
optimizations tips about MySQl query and believe me the perfomance wasnt good)
i can say for sure that the shema with the posting lists is the best for large documents
collections...(by the way Lucene somehow has the same index shema too...)
NOw about the tables that you propose, the first is the very famous LEXICON and offcourse i use it in my application...
The same happends with the second one where i save the path or the adress of the each document that i am indexing...
Now one thing that i want to test is to use the file system
to create my index instead of the DBMS(SQL SERVER 2005)
but i dont know if i will be efficient to use in binmode files to save all the required info that i need..
One last question for me is how can i read from a bit string one bit per time...
i fetch from the DBMS a binary compressed string with all the docids for one term..
To decode the string i use the below code wich is extremely fast...
my $decode = unpack "b*",$compressed;
But because i need to read from the string a bit per time and the below code is very slow ...
my @bits = split(//, unpack("b*", $compressed));
i want to ask if there is anyway to read from the
compressed or the
decode string a bit per time...
Thanks for your Time..
Mimis