in reply to Re: Module namespace: OpenDirectory something?
in thread Module namespace: OpenDirectory something?
In the second pass, the key words minus stop words are connected to categories using a DB_File-tied hash. This takes a pretty long time.
The third pass is the matching. This takes between one and maybe ten seconds depending on how much of the hash files are still in disk cache. It's a pretty naive way of performing the match, ideas and suggestions are welcome :)
The XML parsing is home-grown, although there are modules for doing RDF stuff. It's not that difficult to get right anyway. The worst problems are a) spidering a million links takes time, and b) the dmoz editors keep changing the category structure all the time.
/J
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Re: Re: Module namespace: OpenDirectory something?
by samtregar (Abbot) on May 11, 2002 at 20:38 UTC |