in reply to Re: Summarizing Mail Activity (Long Tail Problem)
in thread Summarizing Mail Activity (Long Tail Problem)

If an entire day is processed at once and only the top 10 counts are required, then your approach could be optimized further if you were to

An obvious limitation to this restriction is that if the number of results per day increased (to the top 20, for example), then the processing script would have to be re-run. The OP didn't indicate if that was a possibility. This risk could be mitigated by storing data for the top 10%, the top 50, etc, and it would still represent a significant savings in storage space compared to storing all of the summary data.

Update: I may have misunderstood the requirements, as thezip mentions, below.

  • Comment on Re^2: Summarizing Mail Activity (Long Tail Problem)

Replies are listed 'Best First'.
Re^3: Summarizing Mail Activity (Long Tail Problem)
by thezip (Vicar) on Mar 24, 2007 at 04:18 UTC
    Yes, but I understood a requirement to be able to display *all* historic data for "Yesterday's Top 10", regardless of if they were Top 10 for any previous day.

    Where do you want *them* to go today?