matze77 has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks!
I want to convert a bash script to Perl which i know is very CPU and time consuming.
(it is sa-learn which gunzips old quarantined items and feeds it to spamassassin for spam relearning see end for details)

I read of Devel::DProf in another post, but since i know already which part of the script is worst, (its the whole thing :-)) this wont be useful here.

I am wondering if i could do:
Insert sleep some seconds in the loop?
Stop the script (and continue later if this is possible?) if CPU consumption is e.g. more than 30%
(So it would only run if the machine is nearly idle and wont interrupt the other users?)
How could i determine the cpu consumption like linux command sar (Package sysstat)?

What would you suggest, your thoughts?.
Thanks for your ideas in advance.

code of original bash script:
cat spamlearn_old.sh #!/bin/sh for f in `ls /var/lib/amavis/virusmails/spam*.gz`; do echo Learn Spam-Mails from File $f ... gzip -cd $f | sudo -u amavis -H sa-learn --spam --showdots; done
Update: I tried to start the script with a low nice priority, reniced it, but it wont help much, forgot this to mention ... /Update

Replies are listed 'Best First'.
Re: Performance issues
by tirwhan (Abbot) on Nov 28, 2008 at 10:12 UTC

    The script sa-learn (which your above bash script calls) is written in Perl and is part of the spamassassin distribution. You won't achieve anything by converting your little bash script to Perl. You should simply "nice" the sa-learn process and let your operating system do the rest.

    cat spamlearn_old.sh #!/bin/sh for f in `ls /var/lib/amavis/virusmails/spam*.gz`; do echo Learn Spam-Mails from File $f ... gzip -cd $f | sudo -u amavis -H nice sa-learn --spam --showdots; done

    All dogma is stupid.
      Thanks.
      I tried already to start it nice (e.g. reniced after i noticed) it, it wont help much it is very "consuming" anyway.
      As a practice i wanted to convert the bash script and insert the "sleep things" combined with watching CPU load factor, that i cant convert sa-learn i know, but maybe something could be won by making the whole thing a bit more "intelligent" . I forgot to mention that i tried nice earlier.
      Sorry if i didnt describe my goals accurately, didnt give all infos.
      So i give "breathe beyond nice" a try.
        I tried already to start it nice (e.g. reniced after i noticed) it, it wont help much it is very "consuming" anyway.

        Then you're either a.) on a crap operating system or b.) not CPU-bound. I'd guess b.), especially with sa-learn and a huge bayesian database it's quite likely that you're either running out of RAM (and thus starting to swap) or just using up all the systems IO. Try doing a system call trace on the process (strace on Linux), also see what your general resource usage ist (vmstat). Stopping and restarting the process is an extremely crude tool and likely to do more harm and good IMO (especially since AFAICR sa-learn actually locks the bayesian database while inserting, which means spamassassin has to wait for the process to restart if you stop and start when the lock is being held).


        All dogma is stupid.
Re: Performance issues
by Anonymous Monk on Nov 28, 2008 at 09:22 UTC
      Oh yes. Thats "nice" :-). Thank you alot.
Re: Performance issues
by cdarke (Prior) on Nov 28, 2008 at 10:37 UTC
    Problem with using sar is that it measures the whole system, and what it produces probably won't help too much. strace -f -c scriptnamemight help to identify the nature of the bottleneck - in particular the number of forks, but (again) that may be down to the called script.

    Off subject: ls(1) is not needed:
    for f in `ls /var/lib/amavis/virusmails/spam*.gz`
    is better written as:
    for f in /var/lib/amavis/virusmails/spam*.gz
    but that won't help your performance problem.
      You can use this idea:

      First convert your bash script into Perl script, keep the process id of this script in to a file.

      Write another perl script with modules Proc::ProcessTable::Process & Sys::Info::Device::CPU

      Here you can get the CPU consumption of first script using the process id of the above script.

      You can issue stop and continue from this script to the first script based on the CPU load

Re: Performance issues
by Krambambuli (Curate) on Nov 28, 2008 at 09:41 UTC
    Take a look on Sys::Info::Device::CPU. It might be what you want, to insert random sleeps if CPU load passes a given treshold for example.

    Krambambuli
    ---
Re: Performance issues
by weismat (Friar) on Nov 28, 2008 at 10:05 UTC
    From my pov you should consider to run the script with a lower priority. It is typical task of the operating system and the operator to assign priorities.
    Furthermore depending on the size of the files a significant time will be spend in the zip part which is outside your control anyway.