in reply to Re: Summarizing Mail Activity (Long Tail Problem)
in thread Summarizing Mail Activity (Long Tail Problem)
If an entire day is processed at once and only the top 10 counts are required, then your approach could be optimized further if you were to
An obvious limitation to this restriction is that if the number of results per day increased (to the top 20, for example), then the processing script would have to be re-run. The OP didn't indicate if that was a possibility. This risk could be mitigated by storing data for the top 10%, the top 50, etc, and it would still represent a significant savings in storage space compared to storing all of the summary data.
Update: I may have misunderstood the requirements, as thezip mentions, below.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^3: Summarizing Mail Activity (Long Tail Problem)
by thezip (Vicar) on Mar 24, 2007 at 04:18 UTC |