The best algorithm for this is to go through the log files keeping track of how many bytes each user downloaded. If they add up to at least the size of the file, then the user probably completed a download.
Unfortunately, it is not going to be possible for you to get a true accurate count of how many downloads completed successfully and how many were just partial.
I see two problems you will be faced with given the structure of your logs:
1. Your logs do not show the starting position for a 206 partial download (most log formats don't). Without this, you won't know if a user completed the whole download or just started it twice, downloading the first half each time.
2. There does not seem to be any good way of uniquely identifying a user in your logs. Without this, it will be difficult to match up multiple 206 returns to add up the sizes to see if an individual user probably did or did not complete the full download.
You may be able to get a better estimate than your current algorithm by assuming there is one user per IP address and adding up the bytes downloaded from each IP address. This can be improved by looking at the time between requests. If there is a half hour (you decide how long) with no request from an IP address, then further 206 responses are probably a new download attempt.
One more hint: Your 206 sizes may add up to a bit larger than the original file size for a simple, successful download. This will happen for browsers that don't start the next segment right where the previous left off, but rather ask for the tail end of the previous segment (presumably to make sure that it matches what they got back form the previous request).
If you have control over more than just the log file parser, you might insert a random parameter into each download URL so that you can track users better than IP address. For example, instead of
you could set it tohref="/MyFoo-file.zip"
where "RANDOMUNMBER" is something likely to be unique generated at page load time by your preferred page generation technique.href="/MyFoo-file.zip?p=RANDOMNUMBER&ext=.zip"
Note that the parameters on this URL will be completely ignored, but they will get logged to the web server access log which you are parsing.
The "&ext=.zip" is a trick to get some broken browser versions to download and save the file with the right extension. Just make sure the complete URL ends with the extension of the original file.
In reply to Re: Re: Re: File download statistics parsing
by esh
in thread File download statistics parsing
by Anonymous Monk
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |