Hello Monks,
I'm not new to perl, but I'm trying to solve a problem with perl. :-)
Can anyone suggest a strategy for showing a 'top 10' graph over time, where the items on the top 10 list vary dramatically?
I've been asked to plot the top 10 senders of messages through our mail server by date; we've got a database of message headers going back almost 8 years, and comprising more than 30 million records. I need to determine the top 10 senders for the closest full day (aka 'yesterday'), and plot each sender's mail volume as a percentage of each day's mail volumes for the entire time period.
The brute force approach is taking more than an hour to complete, and I've been asked to not hog the database server so much.
I've thought of (and tried) making an aggregation table that I could update daily, but the list of 10 ten users is fairly chaotic and changes significantly day to day (especially on weekends). We have almost 1.5 million distinct sender addresses, so there's a substantial long tail, and the majority of the sender addresses will never show up in a top 10 list. I've already tried aggregating counts for all senders for all dates, but that table ended up being significantly larger than the original table, in part because I'd recorded 0's for addresses that hadn't sent messages that day.
I'm stumped trying to figure out a strategy that will keep an aggregate table down to a reasonable size, while minimizing the amount of querying required to create records for new addresses with enough mail volume to make the top 10 list for a given date.
Thoughts or suggestions greatly appreciated.
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.