in reply to Re: Summarizing Mail Activity (Long Tail Problem)
in thread Summarizing Mail Activity (Long Tail Problem)

As an extension to the excellent approach mentioned by thezip and bobf I would recommend a slightly adapted aggregate table setup:

1: table VOLUMES with columns (date,id,size), index on (date)
2: table EMAILS with columns (id,emailaddress), index on (id)

The lookup table will reduce space requirements because you don't repeat the long email addresses for each day in the VOLUMES table.
Indexes take up space as well, from your note it appears that queries will be done by date/period only so the VOLUMES table index only needs indexed on the date to give you the id's for that day.

You mentioned the need to keep track of total email volume on a given day, this can be entered in the VOLUMES table as an extra entry with id=0, add corresponding entry in the EMAILS table with emailaddress='all'.

  • Comment on Re^2: Summarizing Mail Activity (Long Tail Problem) - lookup table

Replies are listed 'Best First'.
Re^3: Summarizing Mail Activity (Long Tail Problem) - lookup table
by roboticus (Chancellor) on Mar 24, 2007 at 12:24 UTC
    A slight modification to varian's adaptation would be to allow multiple email addresses to map to a single ID. That way, if/when people's email addresses change (marriage, company is bought out, ...) you can still tie their email history together. If ya want....

    --roboticus