in reply to Hit tracking optimization...

If you want really fast, are only dealing with IPv4 addresses and have half a gig of diskspace to spare, then you can have your lookups run at 22,000/second if the ip has already been seen and 120/second for those that haven't.

This by using a single binary file 512MB in size that uses 1 bit to represent each of 4GB possible IPs. Write contention only occurs if two or more new IPs within the same 8 value range, connect at exactly the same time. This is probably a very rare occurance and can be dealt with by a re-read, back-off and retry scheme (think Nagal).

However, the contention risk can be reduce to zero if you are using an OS that supports byte-range file locking. This does slow things down a little, but not hugely.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
"Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."

Replies are listed 'Best First'.
Re^2: Hit tracking optimization...
by BrowserUk (Patriarch) on Jan 25, 2008 at 01:16 UTC

    It's worth pointing out that if your system supports sparse files, the disk space requiement drops consireably. I added 1e6 random hits and the on-disk usage was only 64 MB.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.