oltranzista has asked for the wisdom of the Perl Monks concerning the following question:

Hey,

I’m pretty new to Perl and Linux but I have been given the task to migrate my companies email list server from Lyris to Sympa.

My latest task is creating summary reports for each list. So I found a package called pflogsumm that I can use to feed it any mail log file and it will output a statistical summary.

My plan is to write a pre-processing script that will split up the main mail log file into several files based on the sender email address. The thing is we have some email lists that are about 100,000+ emails and I’m worried about the performance of my pre-processing script.

The email statistics don’t have to be ready immediately after an email is sent, I can schedule a cron job to run this script every 4 hours or so. But it would be nice if I could run this as a daemon and just scrape the last few lines added (not sure if this is possible) to the file and then allocate them to the appropriate file so people could see the progress of their latest email on the fly.

I guess I just need a point the right direction or a suggestion of a particular function or tutorial so I don’t spend hours trying to do something that isn’t possible to do in the first place.

Thanks!
Jason
  • Comment on Most efficient (or fast) way to split a postfix mail log file?

Replies are listed 'Best First'.
Re: Most efficient (or fast) way to split a postfix mail log file?
by shmem (Chancellor) on Mar 10, 2008 at 23:24 UTC
    I guess I just need a point the right direction or a suggestion of a particular function or tutorial so I don’t spend hours trying to do something that isn’t possible to do in the first place.
    A big ++ for that alone. I tend to do just that - seek advice only after everything else has failed. Being a complete autodidact, it's sort of tradition. But seeking advice beforehand has - apart from learning being much lighter - a big sociable effect.
    My plan is to write a pre-processing script that will split up the main mail log file into several files based on the sender email address.

    Why? Did you hit any boundaries with pflogsumm? Is splitting even necessary?

    Splitting up a mail log file is tricky, since the sender (envelope) address isn't the reference for log entries, afaik. Each mail gets assigned a message ID, and depending on the mailer, a processing identifier which might change through the delivery chain, depending on setup. If you have e.g. a sandwich setup, postfix delivering to spamassassin and a virus checker, then picking up the message again with a second instance of postfix, the internal IDs will likely change; you would then have to take that into account. Having all that done and producing a set of output files suitable for pflogsumm, you will have either rewritten or duplicated a large part of pflogsumm itself.

    Since you didn't mention any constraints of your setup yet (processing speed? memory? disk space?) but only spoke of your plan, more information is needed to give good advice. What has been the outcome of using pflogsumm "as-is" so far?

    --shmem

    _($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                                  /\_¯/(q    /
    ----------------------------  \__(m.====·.(_("always off the crowd"))."·
    ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}
      Thanks for the reply shmem. I looked over the docs/man page for pflogsumm and didn't see any feature to split by "from" address or message id so I thought I would just feed it pre-split files.

      The reason I need to split these is we provide email list server services to multiple clients and on Lyris they are used to getting smtp reports for their individual lists. So I need to either write a program to parse this info out from scratch or find a third party program to bolt onto Sympa to provide this data.

      I suppose I could patch pflogsumm since it is also written in perl but I was wanting to do get this done in a less obtrusive way.

      We have a dedicated server running Debian Linux on a T1 line. The hard disk space isn't something my company is worried about me using with this new email server too so that shouldn't be an issue either.

      Thanks!
      Jason