comment on

Hi All,

Thanks for looking.

I am hoping to optimize (read: speed up) a script that produces statistics from call log files. The files are flat files in which each line describes a leg of a SIP phone call. Each call has a unique ID but the call could have multiple legs (therefore multiple occurrences of the ID in each file. A short summary of the method would be this:

1: Loop through log file and create a hashref keyed on the unique ID of the call that captures certain columns from the file. e.g.:

@this_line = split(/;/,$_);
$session_id = $this_line[0];
$call_leg_index = $this_line[1];
$dur = $this_line[2];
$pdd = $this_line[3];
$call_ids->{$session_id}->{$call_leg_index}->{duration} = $dur;
$call_ids->{$session_id}->{$call_leg_index}->{post_dial_delay} = $pdd;
[download]

2: Here's where the slowness is coming in. I loop through the resulting hash (based on session_id) and use an inner loop to run a few eval blocks in order to calculate my statistics. The strings that I put in the eval block are pulled from a hashref I created from a "config table" in mySQL (uses a fetchall_hashref so I'm only doing it once).

 for my $this_call_id ( sort keys %$call_ids ) {
    $count++;
    next if !$this_call_id;
    my $route_attempts = 0;
    for my $this_index ( sort keys %{$call_ids->{ $this_call_id }} ) {
        $route_attempts++;
        foreach $aggregate_name ( keys(%{$agg_snippets})){
            my $this_group_data = eval $grouping_data_eval;
            my $snippet = $agg_snippets->{$aggregate_name}->{'snippet'
+};
            $summary_data->{$this_group_data}->{$aggregate_name} = 0 i
+f !$summary_data->{$this_group_data}->{$aggregate_name};
            $summary_data->{$this_group_data}->{$aggregate_name} += ev
+al $snippet;    
        }
    }
}
[download]

3. Loop through newly created hash and push stats to a mySQL database. This is working very quickly.

The idea behind the eval blocks is to add a layer of abstraction so that when I need to add additional statistical analyses I can add entries to mysql with the proper eval string. I know this could be done via flat file or XML but I don't believe it is costing any extra time to dip the DB once to get my eval strings.

I am looking to have as near to real time statistics as I can but the above process is really bogging down during the loop which groups data into the second hashref. For my purposes, which is to push this data into a database that can be queried and graphed, the process will get bogged down to the point where I don't believe I'll be able to catch up. (The logs cut off every 5 minutes) and the process is taking about that long on a large file (100k rows). When I look at TOP it shows Perl using about a full processor but not a whole lot of memory (~4%).

I'm hoping that someone might have an idea that could help speed the process up. I'm not looking for anyone to write code for me. Just point me in the right directions or drop a few cryptic terms that I can research.

This is done in Perl 5.10.1 on Centos 6, fyi.

In reply to Optimization Help on Perl Hash Traversal (eval use) by mwb613

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.