in reply to Help update the Phalanx 100

Would it be possible to regenerate this data including a datestamp? That would facilitate filtering out mass revision downloads, mirroring, etc. As a rough algorithm, I'd guess that counting each distribution (regardless of revision) once per IP per day would be a reasonable first-approximation.

-xdg

Code posted by xdg on PerlMonks is public domain. It has no warranties, express or implied. Posted code may not have been tested. Use at your own risk.

Replies are listed 'Best First'.
Re^2: Help update the Phalanx 100
by stvn (Monsignor) on Dec 21, 2004 at 05:14 UTC
    As a rough algorithm, I'd guess that counting each distribution (regardless of revision) once per IP per day would be a reasonable first-approximation.

    Part of me agrees that revisions should probably be ignored, but then the counts will likely skew upwards if there are many revisions. However that skew could be viewed an indicator that the modules development is being followed regularly and not just downloaded as a curiosity. It's a difficult one.

    As for the "once per IP" idea, I have found with log analysis that IPs can be deceiving because of proxy servers. So I think that we should be wary of weighting the IPs too much.

    -stvn
Re^2: Help update the Phalanx 100
by petdance (Parson) on Dec 21, 2004 at 06:36 UTC
    OK, you've now got the timestamp and the user agent. Lotsa Googlebot action there.

    xoxo,
    Andy