comment on

Background: A central syslog server receives syslog messages from a servers and filters messages of a certain type to a PERL script which in turn reads STDIN. The task of the script is to flatten the multi-line messages into a single line and write them out to disk. Because the lines may not all be related when they come in I use a hash index to relate all of the associated lines. Some lines have a "end of event" marker which is used to trigger then write while others may not and I time them out after 5 seconds.

Issue: Currently the script is single threaded and it's anticipated that this limit will be the first one reached since rsyslog is multithreaded. While I've read some about threading in PERL which could help they seem to be directed towards workloads which actually do work where the this script mostly spends time on very small tasks of 1) looking for a couple of patterns in a single stream; 2) writing data out; 3) occasionally stepping through a couple of loops writing out expired data.

Question: Given the nature of this script, would threading help or does such things as the single input stream and the needing to relate the data before writing hinder that?

The main code loop:

while(<>){
    chop;
    if (/^node=(\S+).*audit\((\d+\....):(\d+)\)/){
        if (! $time{$2}{$1}{$3}){
            $time{$2}{$1}{$3}=1;
        }
        if (/^node=(\S+) type=EOE msg=audit\((\d+\....):(\d+)\)/){
            print_data($1,$2,$3);
            $totalevents++;
        }else{
            push(@{$data{$1}{"$2:$3"}},$_);
        }
    }
    $cnt++;
    if ($cnt > $agecheck){
        # see if entries have aged off and should be written out
        $date=&dateonly;
        while (my ($t)=each(%time)){
            if ($t < (time() - $age)){
                foreach my $host (keys(%{$time{$t}})){
                    foreach my $event (keys(%{$time{$t}{$host}})){
                        logit("Aged: node=$host $t:$event");
                        print_data($host,$t,$event);
                        update_stats();
                    }
                }
            }
        }
        $cnt=0;
    }
    $totallines++;
}
[download]

sub print_data {
...
    # Dedup the data base on data in the string
    # Parent: http://www.perlmonks.org/bare/?node_id=104565
    # specific post: http://www.perlmonks.org/bare/?node_id=104602
    my $singleline=join(" ",@{$data{$host}{"$time:$event"}});
    $databefore+=length($singleline);
    $singleline=~s/((\S+)\s?)/$count{$2}++ ? '' : $1/eg;
    $dataafter+=length($singleline);

    print ${$fh} "$singleline\n";

...
}
[download]

In reply to Multi-CPU when reading STDIN and small tasks by bspencer

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.