PyroX has asked for the wisdom of the Perl Monks concerning the following question:

Hello All,

I still consider myself to be a newbie when dealing with perl. I have below a script I wrote, but it is not very clean at all. If fact after a couple of days, it was hogging 74MB of ram on my server. It starts at around 3 or 4 MB. I obviously have some problems somewhere. I am sure the code can be a heck of a lot simpler as well. But I do not have the experiance or knowledge to make it better. I am looking for help locating poor code techniques and to make it cleaner. It all works fine as it is:

#!/usr/bin/perl system("clear"); use DBI; use Mail::Sendmail; $red="\033[31m"; $green="\033[32m"; $yellow="\033[33m"; $blue="\033[34m"; $cyan="\033[36m"; $ct=0; $cnct="$green YES $white"; $vxd=0; @denies =("deny","TCPZSPconnectionZSPdenied","was2","real","probe +"); @ibm =("configured","rp01","ibmproxy","failed","login","timeZS +Pout","images","http"); @forbids =("forbid","caution"); @pix =("PIX","warning","IP","from","disconnect","connect","ids +n","interface","serial","0x14"); @errors =("link","error","fast","ethernet"); @interfc =("console","changedZSPstateZSPto","up","down","failed"," +operational"); @trans =("snort","noZSPtranslationZSPgroup","noZSPxlate","portsc +an","udp","tcp"); @snort =("large","packet","bad","traffic","inbound","attempt","w +eb","IIS","cmd.exe","cgi","access","multiple","decode"); @prior =("Priority:ZSP1","Priority:ZSP2","Priority:ZSP3","Priori +ty:ZSP4","Priority:ZSP5","Priority:ZSP6","Priority:ZSP7","Priority:ZS +P8","Priority:ZSP9","Priority:ZSP10",); @watcher=("@pix","@denies","@ibm","@errors","@forbids","@interfc", +"@trans","@snort","@prior"); my $database = DBI->connect("DBI:mysql:pixter",'root','') or $ +cnct="$red NO $white"; print "$cyan VERSION 1.4b \n"; print "AUTHOR: Justin R Carlson\n"; print "Database Linked with Application -$cnct \n"; if($cnct=~/NO/i){ print "\n $yellow CAUTION: $red THE DATABASE IS NOT LINKED, NO +THING FOR THIS SESSION WILL BE SAVED $white \n"; print "\n $yellow CAUTION: $red DATABASE NOT READY, NOT ACTIVE +, or Dead. $white \n"; } for ($i=0; $i<$#watcher+1; $i++){ $watch="$watcher[$i]"; $global=0; @global=split(/ /,$watch); for ($x=0;$x<$#global+1;$x++){ $vxd++; if($global[$x]=~/ZSP/i){ @ret=split(/ZSP/,$global[$x]); $global[$x]=""; $global[$x]=join(' ',@ret); } $spy[$ct]=$global[$x]; $ct++; } } print "\n $vxd total tags \n $white"; while(<>){ $input = $_; @format=split (/:/,$input); @cellmail=split(/ /,$input); $findings=""; for ($r=0; $r<$#spy+1; $r++){ if($input=~/$spy[$r]/i){ $findings="$findings $spy[$r],"; $fn++; } } if($fn>0){ $findings=~s/,//g; $findings=~s/ /_/g; $findings=~s/\///g; $log="$findings"; $log=~s/:/_/g; $log=~s/\./_/g; $log=~s/___/_/g; $log=~s/__/_/g; $script = "CREATE TABLE IF NOT EXISTS `$log` (id INT (100) not + null AUTO_INCREMENT, log1 CHAR (255) not null, log2 CHAR (255) not n +ull, log3 CHAR (255) not null, log4 CHAR (255) not null, log5 CHAR (2 +55) not null, log6 CHAR (255) not null, log7 CHAR (255) not null, log +8 CHAR (255) not null, log9 CHAR (255) not null, log10 CHAR (255) not + null, log11 CHAR (255) not null, log12 CHAR (255) not null, log13 CH +AR (255) not null, log14 CHAR (255) not null, log15 CHAR (255) not nu +ll, log16 CHAR (255) not null, log17 CHAR (255) not null, log18 CHAR +(255) not null, log19 CHAR (255) not null, log20 CHAR (255) not null, + log21 CHAR (255) not null, log22 CHAR (255) not null, log23 CHAR (25 +5) not null, log24 CHAR (255) not null, log25 CHAR (255) not null, PR +IMARY KEY (id))"; my $sql = $database->prepare_cached($script); $sql->execute(); $sql->finish(); $fn=0; @writer=split(" ",$input); $t=$#writer; if($t<2){ $script="INSERT INTO $log (id,log1) VALUES ('','$_')"; } else { $script="INSERT INTO $log (id,log1,log2,log3,log4,log5,log6,log7,log8, +log9,log10,log11,log12,log13,log14,log15,log16,log17,log18,log19,log2 +0,log21,log22,log23,log24,log25) VALUES ('','$writer[0]','$writer[1]' +,'$writer[2]','$writer[3]','$writer[4]','$writer[5]','$writer[6]','$w +riter[7]','$writer[8]','$writer[9]','$writer[10]','$writer[11]','$wri +ter[12]','$writer[13]','$writer[14]','$writer[15]','$writer[16]','$wr +iter[17]','$writer[18]','$writer[19]','$writer[20]','$writer[21]','$w +riter[22]','$writer[23]','$writer[24]')"; } my $sql = $database->prepare_cached($script); $sql->execute(); $sql->finish(); } }


What does it do? It gets info piped to it from tail -f, and then creates / updates mysql tables to dump the data into. It uses all those arrays to make up its own table names. The log that tail is piping data from is written to from about 12 network devices.

Replies are listed 'Best First'.
Re: Clean Code - What a mess...
by suaveant (Parson) on Nov 29, 2001 at 02:15 UTC
    First of all, strict and warnings could help you a lot... making sure all of your variables are defined. Whenever you have variables used only in a loop, it is good to define them with my in that loop, so they get cleaned up when they go out of scope...

    looking at your code I am guessing your leak is in

    $spy[$ct]=$global[$x];
    $ct just keeps getting incremented, so @spy just keeps getting bigger and bigger, and never clears out. There may be other things, but that is what I see from a quick look.

                    - Ant
                    - Some of my best work - (1 2 3)

Re: Clean Code - What a mess...
by runrig (Abbot) on Nov 29, 2001 at 02:22 UTC
    First, I strongly recommend strict. See use strict and warnings. It might seem like a hassle now that the script is mostly written, but it would've been easier if you started with it, and it'll be easier to maintain in the future.

    Second, using prepare_cached WITHOUT placeholders (with variable arguments) is a very bad idea. You WILL use alot of memory if there is alot of unique input. If your insert statement will have a limited number of table names, then use prepare_cached, but use it WITH placeholders (search this site and see the DBI docs). If you're going to have an unlimited number of table names per script, then use prepare instead of prepare_cached, but DO still use placeholders (with most databases, the table name can not be a placeholder argument, it might be different with mysql, though not portable).

    Last, you're not checking the status of any of your DBI calls (except the connect). You might want to look into RaiseError (see the DBI docs), and wrap the whole thing in an eval{}, then check $@, or check the status of each DBI call (prepare, execute, etc.).

Re: Clean Code - What a mess...
by dragonchild (Archbishop) on Nov 29, 2001 at 02:37 UTC
    1. You have a variable @global. This is a bad thing. Name it something useful.
    2. You do an insert explicating $writer[0] through $writer[24]. join would make this cleaner.
    3. qw is your friend. For example,
      @snort =("large","packet","bad","traffic","inbound","attempt","web", +"IIS","cmd.exe","cgi","access","multiple","decode");
      becomes
      @snort = qw(large packet bad traffic inbound attempt web IIS cmd.exe cgi access multiple decode);
    4. You should die after you cannot connect to the database.
    5. Use multi-dimensional arrays, or, better, hashes of arrays. That way, your use of @watcher doesn't require split. Something like
      my @watcher = ( \@pix, \@denies); ... my @global = @{$watcher[$i]};
    6. Where do you use $global? (Using warnings would've caught this...)
    7. if($global[$x]=~/ZSP/i){ @ret=split(/ZSP/,$global[$x]); $global[$x]=""; $global[$x]=join(' ',@ret); }
      should be (before using references correctly...)
      if (my @ret = split /ZSP/, $global[$x]) { $global[$x] = join ' ', @ret; }
    8. Something like the following might help ...
      foreach my $r (@spy) { if($input =~ /$r/i){ $findings .= ' ' . $r; $fn++; } }
    And, there's plenty more where that came from. The most important thing I can say is that your data structures drive your logic, especially in an application like this. Improve your understanding of data structures and your code will be that much easier. Every hour you spend on learning and mastering data structures will translate to a week saved in coding, if you code seriously for a year. That's not an exagerration, either.

    ------
    We are the carpenters and bricklayers of the Information Age.

    Don't go borrowing trouble. For programmers, this means Worry only about what you need to implement.

Re: Clean Code - What a mess...
by IlyaM (Parson) on Nov 29, 2001 at 02:47 UTC
    • First of all start with this node: How to get the most of your question from the monks. It has a lot of good info for newbies. Pay attention for tips about 'use strict' and '-w'. It is must have!
    • Well, next step separate your code on several subroutines. Rule of thumb that no subroutine show exceed one screen. That meens 25 lines. All rules have exceptions but usually long subroutine means bad style.
    • Use DBI's placeholders. See perldoc DBI and What are placeholders in DBI, and why would I want to use them?.
    • There is no need for very long completly unreadable lines like in chunk of your code which creates new table. It can be rewritten using heredocs (see Here, doccy doccy. nice doccy. heredoc, treat.). Something like
      $sql->do(<<SQL); CREATE TABLE IF NOT EXISTS %log ( id INT (100) not null AUTO_INCREMENT, log1 CHAR (255) not null, .... .... ) SQL
      It looks much more readable. Note that I've replaced several method calls prepare, excute. finish with single do.
    • Loops like
      for($$i=0; $i<$#watcher+1; $i++) { $watch="$watcher[$i]"; ... ... }
      Can be rewritten as
      for my $watch (@watcher) { }
    • Well, and finally memory leaks. I think they are caused by using prepare_cached without thinking. prepare_cached caches statement handler for all queries so for same queries one statement handler can be reused. First of all it doesn't makes too much sense for MySQL because it cannot actually prepare queries for later reuse like Oracle (and probably don't need to do it since it is much more lightweight SQL server). Second of all all your queries are different since you do not use placeholders. For now replace prepare_cached with just prepare and reread docs for DBI. Probably your problems will go away.
random DBI suggestion
by boo_radley (Parson) on Nov 29, 2001 at 03:48 UTC
    I see no database disconnect here. This error will be more noticable when you (and you really should :) ) turn on warnings.
    I see no use of sendmail objects here. (although it might be importing something; I'm not so familiar with it)
    Your SQL statements could benefit from a for loop, here's one way (among many) for your INSERT statement:
    my @ins_fields= ("log"); my @ins_values= ("''"); for (1..25) { $writer[$_ -1] = "value of writer". ($_-1) ; push @ins_fields, "log$_"; push @ins_values, $writer[$_ -1]; } my $ins_fields = join ",", @ins_fields; my $ins_values = join ",", @ins_values; print "INSERT INTO $log ($ins_fields)\nVALUES ($ins_values)";


    There are other ways of doing this, of course, and they're all fun to discover ;-)
    for your series of regexen :
    $findings=~s/,//g; $findings=~s/ /_/g; $findings=~s/\///g; $log="$findings"; $log=~s/:/_/g; $log=~s/\./_/g; $log=~s/___/_/g; $log=~s/__/_/g;
    some of these can be 'optimized' by using tr///; always consider this when you have a direct 1-to-1 relationship between characters in and characters out, or when deleting chars. Your final 2 can be reduced to one regex, whose form depends on the answer to "are series of 4 or more underscores possible? would they be relevant in ways that 2 and 3 aren't?" Take a look at this snippet :
    for (1..10) { my $foo = "_" x $_; $foo =~ s/_+/_/; print "method one : $foo\n"; my $foo = "__" x $_; $foo =~ s/___?/_/; print "method two : $foo\n"; }
    The first one will take up any number of underscores and replace with a single underscore. The second will look for only two or three in a row, if there's 4 or more, it doesn't care. Whether this is relevant depends on your data.
    Be sure to check the results of $sql->execute() -- if it's dying silently, what are the ramifications? use something like $sql->execute() || warn "SQL Execution failed! $DBI::errstr\nscript is $script";
    Ok?
      Sendmail is there due to the fact that there is a function commented out, #mailman() and that code I did not include, thanks for all your help though! Everyone is helping so much I can't believe it, all great tips!
      I see no use of sendmail objects here. (although it might be importing something; I'm not so familiar with it)

      sendmail is exported by the Mail::Sendmail module. It is somewhat unfortunately named, as it is a pure-Perl module, and does not require an external program (such as /usr/bin/sendmail). The author himself realises that it is poorly named, and were he to do it all over again, he would choose the name Mail::Simple.

      --
      g r i n d e r
Re: Clean Code - What a mess...
by perrin (Chancellor) on Nov 29, 2001 at 02:19 UTC
    Please run it through perltidy to make it easier to read.
      unless someone silently added code tags, it looks like the code is really well indented... The code tags are breaking up the long lines...
        Maybe I'm just picky, but I think it's easier to read after a run through perltidy.
Re: Clean Code - What a mess...
by buckaduck (Chaplain) on Nov 29, 2001 at 02:49 UTC
    This is a longshot...

    Maybe you're actually bloating the cache that's created when you use $database->prepare_cached(). I suppose that would depend on how many lines of input this script sees.

    I wonder why you would even want to cache the results of SQL statements such as CREATE TABLE IF NOT EXISTS. Does this help? I would have guessed that the cache was more useful more SELECT queries.

    Anyway, as a longshot, try turning those into $database->prepare() instead, and see if it helps any.

    Even better, substitute placeholders for the variables. That way you can prepare() the statements once and execute() them with the correct parameters each timne through the loop. It may not affect the memory leak, but it's useful in many ways.

    buckaduck

      This application sees about 200,000 lines of input per day, so you think that is it? Tha cache is just getting to be too much?
        The way that prepare_cached works is that it uses your entire SQL statement as the key to a hash (the value is the statement handle). So if statements differ in any way (even whitespace), then you are storing another key/value pair in the statement cache (and the database creates another cursor if the database supports cursors). With 200,000 lines of input and no placeholders in sight, you are abusing prepare_cached and are wasting memory. If you use placeholders, then the only difference in your cached statements will be the table names, and you shouldn't run out of memory (assuming there aren't too many table names).
Re: Clean Code - What a mess...
by PyroX (Pilgrim) on Nov 29, 2001 at 02:22 UTC
    The loop constant loop does not start until the while(<>){ so $ct is not always getting bigger, it only gets alrger each time a new keyword is added to the $spy array at the execution right? I am unsure. , ok so I should use strict, is the warnings a -w after perl on the first line? What does that do exactly? Thanks so far, keep them coming people :)
      If your perl is 5.6 or later then put in 'use warnings;', if earlier (or if you need to be backward compatible), then '-w' after the 'perl' on the first line. see perlrun for -w and perlvar for the related $^W, or 'perldoc warnings' and 'perldoc perllexwarn' for 'use warnings;'.
      You are right, sorry... it is hard to do a quick review of someone else's code :) In a daemonized program, though, you really have to watch for arrays and hashes and what not that just keep getting added to, they are the most common cause of memory leaks... that's why using my in the tightest loop possible is good... then things get garbage collected when they go out of scope...

                      - Ant
                      - Some of my best work - (1 2 3)

Re: Clean Code - What a mess...
by PyroX (Pilgrim) on Nov 30, 2001 at 02:35 UTC
    Ok guys, It appears as though the memory leak is fixed, and with some of your suggestions its starting at 3mb instead of 4, for the past 6 hours it has held steady at 3060 k, so its not going up, or even changing at all. It looks like it is stable at last, thanks to everyone for the tips and advice, as always another excellent experiance here at perlmonks.