Optimizing a Text Canvas: An inner loop and a cloner

JosiahBryan has asked for the wisdom of the Perl Monks concerning the following question:

Greetings, Monks --

Working on a text canvas (attributes, characters, etc.) - PCI::Telnet::GUI::Canvas is the name. (Can post entire module if needed.) Here's a simple example of how it's used:

my $can = PCI::Telnet::GUI::Canvas->new(10,40);
$can->fill([ON_YELLOW,BOLD,WHITE],0,0,$can->height,$can->width,HOR);

my $c2 = PCI::Telnet::Canvas->new(5,10);
$c2->box(0,0,$c2->height-2,$c2->width-2,ON_WHITE,BLACK,1);

$can->render_canvas(2,2,$c2);
print $can->to_string(20,10), "\n\n";
[download]

In a recent dprofpp (clear;perl -d:DProf -MPCI::Telnet::GUI::Canvas -e 1; dprofpp), it was reported that the two functions below use ~80% (83.6 3.122 3.243 1496 0.0021 0.0022 PCI::Telnet::GUI::Canvas::iloop) of the program processing time (under two different usages. In one test case, it was _atlayer, in another test case, it was iloop.)

Any advice from the Monks? Rewrite in C using Inline::C? Any perl tricks to optimize the iloop to make it go faster? Is there something I should do to clean up the code? My apologies for asking what seems to me to be a rather seemingly-simple question - but I'm at my wits end as far as optimizing the code.

Need to optimize:

use constant AS_BG    => 0;
use constant AS_FG    => 1;
use constant AS_U     => 2;
use constant AS_BLINK => 3;
use constant AS_BOLD  => 4;

# clone is from the Clone module on CPAN. Tried using Clone::Fast, but
+ it added .05 sec to running time when this function was called ~6000
+x in .25 seconds.

sub _atlayer
{
    #$_[0] = shift;
    local $_ = clone($_[1]);
    #$_->{$_} = $_[1]->{$_} foreach keys %{$_[1]};
    $_->[AS_BG]    = $_[0]->[AS_BG]    if !$_->[AS_BG];
    $_->[AS_FG]    = $_[0]->[AS_FG]    if !$_->[AS_FG];
    $_->[AS_U]    = $_[0]->[AS_U]        if  $_->[AS_U] < 0;
    $_->[AS_BOLD]    = $_[0]->[AS_BOLD]    if  $_->[AS_BOLD] < 0;
    $_->[AS_BLINK]    = $_[0]->[AS_BLINK]    if  $_->[AS_BLINK] < 0;
    return $_;
}
[download]

and

# iloop is the inner loop of the canvas rendering function. 
# I put it in a separate sub so that DProf could show it
# apart from the main loop - and it gets 79% of the 
# processing time under one scenario.
sub iloop($$$)
{
    my ($line,$arow,$sw,$lat,$blank) = @_;
    my ($atr,$chr,$col,@a);
    for $col (0..$sw)
    {
        $chr = $line->[$col];
        $atr = $arow->[$col];
        
        push @a, _attr($atr) if #!_ateq($lat,$atr);
                $lat->[AS_BG]        ne $atr->[AS_BG] ||
                $lat->[AS_FG]        ne $atr->[AS_FG] ||
                $lat->[AS_U]        != $atr->[AS_U]  ||
                $lat->[AS_BOLD]        != $atr->[AS_BOLD] ||
                $lat->[AS_BLINK]    != $atr->[AS_BLINK];

        push @a, defined $chr ? $chr : $blank;

        $lat = $atr;

    }
    return join '', @a;
}
[download]

Comment on Optimizing a Text Canvas: An inner loop and a cloner Select or Download Code

Replies are listed 'Best First'.
Re: Optimizing a Text Canvas: An inner loop and a cloner by dsheroh (Monsignor) on Jul 06, 2007 at 15:36 UTC
I'm not really much of a Perl optimization expert, but a couple things I noticed: You can get in and out of the sub a little faster by passing and returning refs instead of lists or strings. This minimizes the amount of data that needs to be copied onto the stack and then pulled back off. It would be a very minor gain per call, but, if 6000 calls in 0.25 second is normal, it may add up enough to be noticable. `push` can be surprisingly expensive due to the memory allocation overhead. You can mitigate this by setting the size of your array up front (`$#a = $sw`) and then inserting elements by index. You could also pre-size the array by initializing it with `my @a = $blank x $sw`. Try `$chr \|\| $blank` instead of `?:` - logical OR tends to be faster than doing a comparison, although they're not equivalent if `$chr == 0`. Another way around `push` would be to build the array contents directly in the `join` statement, presumably with `map`. It could get a bit convoluted in this case, since different bytes are determined by different methods, but, based on articles I've read, it sounds like this should be the fastest way to go if you can get an efficient `map`ping worked out. Instead of testing each attribute separately, try packing all of them into a single value, then just compare that from one iteration to the next and you'll save yourself half the dereferences/array lookups (since each structure will only need to be inspected once) and up to 80% of your comparisons (since it'll just be a single `ne` instead of 2 `ne`s and 3 `!=`s). Something like: `$chr = $line->[$col] \|\| $blank; $atr = $arow->[$col]; $array_position++; $curr_attribs = $atr->[AS_BG] . $atr->[AS_FG] . $atr->[AS_U] . $atr->[ +AS_BOLD] . $atr->[AS_BLINK]; if ($curr_attribs ne $last_attribs) { $a[$array_position] = _attr($atr) . $chr; } else { $a[$array_position] = $chr; } $last_attribs = $curr_attribs;` [download] If it's possible to render AS_BG and AS_FG into small enough integers (i.e., if you're dealing with 16 or 256 colors instead of full-bore 24-bit color), then $curr_attribs/$last_attribs could be bitfields instead of strings, which would allow a numeric comparison and probably speed things up even more. OK, I think I've run out of ideas... Hopefully these will improve things enough to spare you the horrors of C.	[reply] [d/l] [select]
Re^2: Optimizing a Text Canvas: An inner loop and a cloner by JosiahBryan (Novice) on Jul 06, 2007 at 15:45 UTC
Wow - great ideas...I'll work in trying them out ASAP. BTW, the AS_BG and AS_FG fields hold standard ANSI FG/BG color codes - e.g. ^[Xm, where X is 30-37 for FG and 40-47 for BG. So, yes, the BG and FG fields could be converted to an integer 0-7 or 10-17, then just add 30 to the resulting output. (Or just use 30-37, 40-47 for the values themselves - doesn't save any bytes to use 0-17.) However, I've never been very good with bitfields - could you be so kind as to offer a code example of how using bitfields would work? (My apologies for the simplistic question.)	[reply]
Re^3: Optimizing a Text Canvas: An inner loop and a cloner by dsheroh (Monsignor) on Jul 06, 2007 at 16:00 UTC
Sure. Assuming that the U/BOLD/BLINK attributes are 0/1 and that you've already converted your colors into integers (0-255) and copied all of your attribute settings into scalars (which you won't really want to do, but it simplifies the example), I would use: `$current_attribs = $as_bg + $as_fg << 8 + $as_u << 16 + $as_bold << 17 + + $as_blink << 18;` [download] This turns the full set of attributes into a 19-bit integer, allowing the CPU to compare two sets of attributes in a single operation when you check whether `$current_attribs == $last_attribs`. (They could be rendered into only 9 bits by taking the colors down into the 0-7 range (3 bits each instead of 8), but it would add a little time to do the conversion without saving any work later on.)	[reply] [d/l] [select]
Re^4: Optimizing a Text Canvas: An inner loop and a cloner by dsheroh (Monsignor) on Jul 06, 2007 at 17:51 UTC
Re^4: Optimizing a Text Canvas: An inner loop and a cloner by JosiahBryan (Novice) on Jul 06, 2007 at 16:10 UTC
Re^3: Optimizing a Text Canvas: An inner loop and a cloner by JosiahBryan (Novice) on Jul 06, 2007 at 17:27 UTC
Right, I think I've got a bit figured out on the bitfields. However, this doesn't work as expected: `use constant AS_BG => 0; use constant AS_FG => 8; use constant AS_U => 16; use constant AS_BLINK => 17; use constant AS_BOLD => 18; my $test = 0; #$test \|= ($v << $k); # set $k to $v #$test &= ~($v << $k); # print "Test0:[$test]\n"; $test \|= 41 << AS_BG; $test \|= 31 << AS_FG; $test \|= 0 << AS_U; $test \|= 0 << AS_BLINK; $test \|= 1 << AS_BOLD; print "Test1:[$test]\n"; print "Test parts: ".($test >> AS_BG).", ".($test>>AS_FG).",".($test>> +AS_U).",".($test>>AS_BLINK).",".($test>>AS_BOLD)."\n";` [download] Output: `Test0:[0] Test1:[270121] Test parts: 270121, 1055,4,2,1` [download] Now, I'm sure I'm missing something here - but what? :-)	[reply] [d/l] [select]
Re: Optimizing a Text Canvas: An inner loop and a cloner by BrowserUk (Patriarch) on Jul 06, 2007 at 15:50 UTC
I think the best way to optimise your application would involve a substantial redesign from what you currently have. The subs/loops you are seeking to speed up deal are looping over arrays of attributes. Those attributes could be more efficiently stored using bytes (or maybe just bits) in a string. Besides saving some space (A scalar of 5 bytes requires ~17 bytes of ram, versus an array holding 5 scalars needing something like 400), and being more efficient to duplicate, some or all of the loops you are currently using comparing, copying and 'defaulting' attributes maybe replacable by boolean operations. For example, if your attributes were stored as a string: `"FBubk"` where F/B are the foreground and background colors; u/o/k are the underline bold and blink attributes; then this `push @a, _attr($atr) if #!_ateq($lat,$atr); $lat->[AS_BG] ne $atr->[AS_BG] \|\| $lat->[AS_FG] ne $atr->[AS_FG] \|\| $lat->[AS_U] != $atr->[AS_U] \|\| $lat->[AS_BOLD] != $atr->[AS_BOLD] \|\| $lat->[AS_BLINK] != $atr->[AS_BLINK];` [download] Could become `push @a, $atr if $lat ne $atr;` [download] This `local $_ = clone($_[1]);` would simply be `local $_ = $_[1];` avoiding a function call. Even with your current setup, it's not clear to me why you need to use `clone()` to replicate and array? A simple `local $_ = [ @$_[1] ];` would work just as well and (I think) be more efficient. It's also possible (but I haven't fully digested your code so I might be wrong), that by using strings instead of arrays, that: `sub _atlayer { #$_[0] = shift; local $_ = clone($_[1]); #$_->{$_} = $_[1]->{$_} foreach keys %{$_[1]}; $_->[AS_BG] = $_[0]->[AS_BG] if !$_->[AS_BG]; $_->[AS_FG] = $_[0]->[AS_FG] if !$_->[AS_FG]; $_->[AS_U] = $_[0]->[AS_U] if $_->[AS_U] < 0; $_->[AS_BOLD] = $_[0]->[AS_BOLD] if $_->[AS_BOLD] < 0; $_->[AS_BLINK] = $_[0]->[AS_BLINK] if $_->[AS_BLINK] < 0; return $_; }` [download] might become something like `sub _atlayer { #$_[0] = shift; local $_ = $_[1]; $_ &= $_[ 0 ]; return $_; }` [download] Though you might have to tweak how you represent the absence of attributes. Given your current setup, that sub might be slightly more efficient coded as `sub _atlayer { local $_ = [ @$_[1] ]; $_->[AS_BG] \|\|= $_[0]->[AS_BG]; $_->[AS_FG] \|\|= $_[0]->[AS_FG]; $_->[AS_U] \|\|= $_[0]->[AS_U]; $_->[AS_BOLD] \|\|= $_[0]->[AS_BOLD]; $_->[AS_BLINK] \|\|= $_[0]->[AS_BLINK]; return $_; }` [download] if you could represent the absence of underline, bold and blink as 0 or undef rather than a negative number (-1?). From the bits commented out in your code, it looks like you already made the transition from using hashes to arrays to store your attributes. I think that moving to use strings would be even better. Not PC perhaps, but certainly more efficient. Sorry this is all ifs and maybes. I'm not able to run your code here. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply] [d/l] [select]
Re^2: Optimizing a Text Canvas: An inner loop and a cloner by JosiahBryan (Novice) on Jul 06, 2007 at 16:00 UTC
Much appreciated reply - I didn't even think of the `[ @$_[1] ]` construct - that should save a lot of clone() calls - I'll test asap. BTW, as way of explanation, the _atlayer sub basically 'layers' the attributes of the current "character" onto the already-existing attributes at the position. First arg to _atlayer is existing attributes, second arg is 'new' attributes. The clone/copy of the new atrributes is necessary because the 'new' attributes are stored as an array ref in the object - a la the current 'brush' in a Qt painter. I can't just modify the "brush" array ref - must copy first, else the master "brush" would be modified for all subsequent calls. The 'layering' mentioned basically says that if there are no attributes in the "brush" (e.g. the -1 or the undef), then apply attributes from the existing attribute ref, but if the brush has any attributes for that category (even an 'off' flag, such as for bold - which would make the final output have a DIM ansi code) the value sent in the second argument will take precedents over the existing attribute at that location. Whew..takes a breath ... sorry, complicated explanation for what is actually kinda simple. Does it make sense?	[reply] [d/l]
Re: Optimizing a Text Canvas: An inner loop and a cloner by vkon (Curate) on Jul 06, 2007 at 12:37 UTC
if you want faster perl/Tk, use Tcl::Tk. Tcl::Tk is far more powerful in other aspects, also.	[reply]
Re^2: Optimizing a Text Canvas: An inner loop and a cloner by JosiahBryan (Novice) on Jul 06, 2007 at 14:03 UTC
I'd love to use Tk - but I'm writing a UI that needs to run over a simple socket protocal - doesn't have to be true telnet, but that's the idea. (Auto plant where I work wants to use putty to telnet into the VLS cluster which runs our MRP system (also written in perl - and runs on a four-node MySQL cluster)) Anyway, the point of making our own canvas is to run it over a socket in text mode - unless Tk does some funky text mode that I don't know about.	[reply]
Re: Optimizing a Text Canvas: An inner loop and a cloner by JosiahBryan (Novice) on Jul 06, 2007 at 14:06 UTC
Source for the PCI::Telnet::GUI::Canvas module: Read more... (32 kB)	[reply] [d/l]
Re^2: Optimizing a Text Canvas: An inner loop and a cloner by JosiahBryan (Novice) on Jul 06, 2007 at 16:08 UTC
See link below for a sample of the Canvas output for the sub print_demo1(): http://www.josiahbryan.com/Canvas-print_demo1-snapshot.png Let me know if more snapshots of the Canvas are desired.	[reply]
Re^2: Optimizing a Text Canvas: An inner loop and a cloner by JosiahBryan (Novice) on Jul 06, 2007 at 18:23 UTC
UPDATED: Source for the PCI::Telnet::GUI::Canvas module: I've updated the iloop at _atlayer code - removed a lot of the push() calls from the code (seemed to help iloop - some.) _atlayer hasn't been helped much - most I've done is been able to remove the clone() call - thanks BrowserUk! Read more... (35 kB)	[reply] [d/l]