in reply to Data indexing in BerkeleyDB hashes
am now trying to decide upon the best aspect of the web pages being indexed to store as the hash key.
...depends what you're going to know when you want to get the value out again, no? If you're just going to be looking up pages then i'd say an escaped url was ideal, though you might need to consider synonyms like / and /index.s?html? and all that.
I've been in roughly this situation before and settled on using id numbers, out of the misguided conviction that indexing and sorting would be more efficient. I ended up performing far more lookups than were really necessary, mostly just to get the url back :(
|
|---|