Martin90 has asked for the wisdom of the Perl Monks concerning the following question:

Hello, I am writting a perl script which does few things one after one. Each one is quite big and cpu cost + script will be running every hour. So far I didn't test anything, just wonder if execution of steps described bellow may overload system and what can I do to avoid this. Script steps:
1. open huge text file by:
open(my $fh, "<", my_file.log") or die "cannot open, ($!)\n"; while (<$fh>) { chomp; next unless $_ =~ /Key/i;

2. Store "key" lines in hash (may store aprox.1k lines in hash)
3. Analyze stored keys with diffrent values
4.Update Mysql databse

So, what I want to avoid is program crash in the middle of work. Let's say, analyzed text file but didn't update. What can I do ? Maybe ther is a way to slow things a liittle bit (sleep?) and gain stable work ?

Replies are listed 'Best First'.
Re: Overload problem
by kcott (Archbishop) on Jan 17, 2014 at 21:32 UTC

    G'day Martin90,

    There's not really enough information to say whether those steps will overload your system; for instance, you don't provide any information about your system nor do you indicate what "Analyze stored keys" involves. I also don't understand why you suggest "slow things a liittle bit (sleep?)" when you've already indicated a set timeframe ("every hour") for processing.

    Having said that, here's some pointers that might help:

    • Keep track of each stage (possibly a simple text file indicating progress); include code to allow restarting from some known good point.
    • Include "Step 2" in the 'while (<$fh>) {...}' loop.
    • Use Storable to serialise the hash prior to performing whatever analysis is required.
    • It's a decade since I last used MySQL — I don't know what facilities are available but you should make the database update all or nothing.
    • You can speed up your regex by anchoring the pattern to one, or both, ends of the string if possible (e.g. with '^', '$', etc.) and not using the 'i' modifier.

    -- Ken

Re: Overload problem
by Old_Gray_Bear (Bishop) on Jan 18, 2014 at 01:46 UTC
    Don Knuth said:
    Premature optimization is the root of all evil.
    Test, Bench Mark, Analyze the data. You don't know if you have a problem until you do.

    See the New York Times profiler for more data about your processes and subroutines.

    ----
    I Go Back to Sleep, Now.

    OGB

Re: Overload problem
by hippo (Archbishop) on Jan 17, 2014 at 20:56 UTC
    So far I didn't test anything, just wonder if execution of steps described bellow may overload system ... What can I do ?

    Don't wonder, just test. You haven't said how big your file is, how much RAM your system has, what else might be taking up that RAM, etc. so that there is nothing on which to base any assumption that your code will run to completion or otherwise.

Re: Overload problem
by Laurent_R (Canon) on Jan 17, 2014 at 22:44 UTC
    You don't give enough details, but if your worry is whether you can store one k-line in your hash, then don't worry, this is really no problem (unless your lines are several hundreds of kB long, of course). Even storing a million 100-bytes lines is usually not a problem on most systems, but that's getting closer to the limit; above that, the outcome will depend on your system's memory and other characteristics, the average line length and other similar factors.