comment on

Let me first say, I'm writing this Meditation as guide when you have to make a perl script that works on a big set of data.

So, to start out with, I've found, the sooner you decide to use inline c, the better. If you have a complex data structure, you may want to use inline c++ and classes to represent it. Not only will you get a speed boost (most of the time), but also the memory footprint should be smaller.

The next thing you should look at is using direct memory calls rather than function calls. Functions have a lot of CPU overhead, so store data where you can, and use direct access for that data.

Be very careful for recursion. Remember, what may work with 10 records in a reasonable amount of time; may not at 90,000 records. So if you are getting something from recursion, store it rather than calling that function again. I’m sure right now you’re saying, well duh, I knew that, but here’s where you may forget it:

.....
if($object->some_recursive_call > $value){
    $value = $object->some_recursive_call;
}
.....
[download]

It is better to do this:

.....
my $temp = $object->some_recursive_call;
if($temp > $value){
    $value = temp;
}
.....
[download]

Or this:

.....
my $temp = $object->some_recursive_call;
$value = $temp if $temp > $value;
.....
[download]

Do not use Class::Struct to create your objects, it will not give you the performance that you require. Instead use real Perl classes, they aren’t hard to write, and nothing to fear.

Change your Perl objects (if you find out too late to change to c objects) to use arrays rather than hashes. Hashes cost you some in speed, and quite a bit in memory. If you have hashes right now; create some constants to represent the values in your hash. So depth becomes $node::depth. This makes it easier to change; all you have to edit is the brackets and the prefix to the string you were using before.

Use c wherever you can to increase speed and memory usage. A c function call takes less CPU then a Perl call. Recursive calls on your Perl data structure is possible, but takes some real voodoo c programming (through XS.) Here is an example of using XS and recursion on a Perl data structure that is already arrays:

# perl
# node:
sub countLeaves{
    my $self = shift;
    my $ret = 0;
    foreach my $item (@{$self->[$node::child]}){
        next if ! defined($item);
        $ret += $item->countLeaves;
    }
    $self->[$node::leaves]=$ret;
    return $ret;
}
#leaf:
sub countLeaves{
    return 1;
}

# calling the Perl function:
$tree->countLeaves;

# C code:
I32 getLeavesInt( SV* root, I32 childPlace, I32 numKeys,
        I32 lPlace, I32 wPlace){
    I32 i, n, curr;
    AV* arr1;
    AV* arr2;
    SV* child;
    if( !SvROK(root)){
        return 0;
    }
    arr1 = (AV*)SvRV(root);
    n = av_len(arr1);
    curr = 0;
    if(n>1){
        if(!SvROK(*av_fetch (arr1,childPlace,0))){
            return 0;
        }
        arr2 = (AV*)SvRV(*av_fetch (arr1,childPlace,0));
        for(i = 0; i <= numKeys; i++){
            if(!(av_fetch(arr2,i,0) == NULL)){
                child = (*av_fetch(arr2,i,0));
                if(SvROK(child)){
                    curr += getLeavesInt(child,childPlace,numKeys,lPla
+ce,wPlace);
                }
            }
        }
        av_store(arr1,lPlace,(SV*) newSViv(curr));
        root=((SV*)arr1);
    }else{
        if(SvIV(*av_fetch(arr1,wPlace,0))!=0){
            curr = 1;
        }
    }
    return curr;
}

SV* getLeaves( SV* root, SV* childPlace, SV* numKeys,
        SV* lPlace, SV* wPlace){
    I32 ret;
    ret = getLeavesInt(root,SvIV(childPlace),SvIV(numKeys),
            SvIV(lPlace),SvIV(wPlace));
    return newSViv(ret);    
}

# calling the c function is:
getLeaves($tree,$node::child,$numKeys,$node::leaves,$leaf::weight);
[download]

Though there may be more lines of c, the c runs considerably faster than the Perl version. By using recursion such as this (not only in this function, but others as well) I have halved both my running time and memory usage.

One final note: optimize everything you can. No reason add extra variables that aren’t needed in your data structure (as defined by some command line options for example.)

In reply to Faster Perl, Good and Bad News by abitkin

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.