Re: How to speed up a nested loop?
by GrandFather (Saint) on Sep 17, 2008 at 22:33 UTC
|
I can't imagine the following refactoring will make much difference, but it may help a little:
my $bxScaled = $bx * 16;
my $byScaled = $by * 16;
for my $x (0 .. 15) {
my $xScaled = 16 * $x;
for my $y (0 .. 15) {
# this calculates the tile index we are currently at,
# using the x and y coords in this block
my $tile_index = $y + $xScaled;
# this calculates the real x and y values of this tile
my $rx = $bxScaled + $x;
my $ry = $byScaled + $y;
# insert type data into the internal map array and
# note that there was change in this block if applicable
my $tile = $tiles[$rx][$ry][$bz][type];
if (!defined $tile || $$tile != $type_data[$tile_index])
{
$changed = 1;
$$tile = $type_data[$tile_index];
}
}
}
Main changes are to reduce the number of multiplies performed in the inner loop and to reduce the number of array indexing operations.
Update: A quick benchmark indicates about a 40% speed improvement. I'm surprised!
Update: a further small improvement (about 10%) is realized by moving the $rx calculation to the outer loop.
Adding my $xTiles = $tiles[$rx] to the outer loop and using my $xTiles in place of $tiles[$rx] in the inner loop gives another 2% improvement (although that's getting to be in the benchmark noise and may be bogus).
Perl reduces RSI - it saves typing
| [reply] [d/l] [select] |
|
|
You can factor more work out of the inner loop, but I would expect the win to be very minor:
my $bxScaled = $bx * 16;
my $byScaled = $by * 16;
for my $x (0 .. 15) {
my $xScaled = 16 * $x;
# These calculations were factored out of the inner loop
my $rx = $bxScaled + $x;
my $tiles_for_rx = $tiles[$rx];
for my $y (0 .. 15) {
# this calculates the tile index we are currently at,
# using the x and y coords in this block
my $tile_index = $y + $xScaled;
# this calculates the real y value of this tile
my $ry = $byScaled + $y;
# insert type data into the internal map array and
# note that there was change in this block if applicable
my $tile = $tiles_for_rx->[$ry][$bz][type];
if (!defined $tile || $$tile != $type_data[$tile_index])
{
$changed = 1;
$$tile = $type_data[$tile_index];
}
}
}
Incidentally I'm puzzled at the array access of type. Should that be a $type? Or perhaps a hash access? Either way the current version looks like a bug.
Incidentally another cause of your win over the original is that you're using lexical variables consistently, which I believe are slightly faster than package variables. | [reply] [d/l] |
|
|
I assumed type was a constant. It may be a transcription error of some sort. I can't see it making any difference to the results of refactoring however.
Don't we always use lexical variables and strictures? ;) The OP may be interested in reading The strictures, according to Seuss.
Perl reduces RSI - it saves typing
| [reply] |
|
|
|
|
Your hint has some truth. I should probably refactor my tiles array so i can get a reference to $tiles$bz$rx and go on from there. However you should probably read up about references a bit, as your code would never work as it is there. :)
As for type, that is indeed a constant. Hashes are too slow, thus i used constants to build pseudo-hashes.
| [reply] |
|
|
|
|
|
|
|
I played with this for entirely too long and got about a 8% improvement on tilly's according to Benchmark (on linux with an old Pentium M). This was mostly by changing $tile_index to a counter instead of finding it by addition each time. I also made a lookup table for bx/byScaled but this did little.
While doing this I got to wondering about the code snippet at the end:
my $tile = $tiles[$rx][$ry][$bz][type];
if (!defined $tile || $$tile != $type_data[$tile_index])
{
$changed = 1;
$$tile = $type_data[$tile_index];
}
which I don't understand. If $tile is undefined where does $$tile go to? If $$tile goes nowhere how come there's no warning or anything?
I combined everyone's code together in one large benchmark:
Thanks for the introduction to Benchmark and to the OP for an interesting node.
Update: after looking at the code I posted I may have screwed up the order of comparisons and I'm too tired to fix it... but the point was... using an iterator.
You could even have the x,y,z stored linearly in an array which would reduce the loops and have the same results. Then you could have an array of arrays (because you need a sub-array for the type element right?) and just iterate over with one loop instead of two.
IE a matrix like this:
(1 2 3)
(4 5 6) ===> ( 1 2 3 4 5 6 7 8 9 )
(7 8 9) how to do in three dimensions, I dunno...
| [reply] [d/l] [select] |
|
|
Tilly's code is missing a few \s and ${}s to be able to work.
Also, thank you, the counter for the tile actually got it down to .046 over the .048 from GF. :)
As for storing it linearly ... Honestly i don't think it's worth the effort. This stuff is already difficult enough to understand and keep in my mind as it is and i'm not *that* desperate for performance. ^^
If you're curious, here's a flow diagram.
*Update*: These things actually reduced performance:
my $rx = $bxScaled[$x];
my $ry = $byScaled[$y];
I guess the simple additions there are faster than array lookups. | [reply] [d/l] |
|
|
With your changes i'm at the following code and have a real life application improvement from 0.055 seconds for a whole "syncToDF" loop down to 0.049, which is already pretty respectable. Additionally you've helped show me where i need to keep my eyes open. Thanks a lot! :D
my ($tile_index,$rx,$ry,$y,$x,$xScaled);
my $bxScaled = $bx * 16;
my $byScaled = $by * 16;
for $x ( 0..15 ) {
$rx = $bxScaled+$x;
$xScaled = 16 * $x;
for $y ( 0..15 ) {
$tile_index = $y+$xScaled;
$ry = $byScaled+$y;
if (
!defined $tiles[$rx][$ry][$bz][type] ||
$tiles[$rx][$ry][$bz][type] != $type_data[$tile_index]
) {
$changed = 1;
$tiles[$rx][$ry][$bz][type] = $type_data[$tile_index];
}
}
}
Now to see whether there's some more good hints. :) | [reply] [d/l] |
Re: How to speed up a nested loop?
by ikegami (Patriarch) on Sep 18, 2008 at 06:30 UTC
|
You'd be better off if you'd rearrange your data structure so that you could use strings of 256 bytes (or whatever is big enough for what you are assigning) and string bit ops.
$changed = ~$defined | ($type ^ $new);
$type = $new;
$defined = "\xFF" x 256;
But since you since you provided no information about your data structure — What are $bz, tile and the data you are assining, and why do you add $bx+16 and $by+16 — can't help you beyond that.
Update: Oops, the sample code I provided worked at the bit level, not the byte level. And I can't think of a good way to do it at the byte level at the moment.
Why aren't you using PDL anyway? | [reply] [d/l] |
|
|
Ok, short explanation about what's going on:
There's this game that looks like this: http://www.tigsource.com/features/images/dwarffortress-big.png And i'm writing this to get something that looks like this: http://www.videogames.net.au/images/dwarf-fortress-3d-visualizer-beautiful-fortress1.jpg
The game internally stores the data as 16x16 blocks of single tiles. My script grabs the current cursor position and then grabs the data around the cursor and converts it into a big unified multi-dimensional by extracting the tile type and storing it in said array. That array is then later used to render images in OpenGL.
$bx and $by are the coordinates of each 16x16 block on the 2d plane, while $bz is the z dimension that's equal for tiles and blocks. I get the data for each block as an array of 256 values, through which i cycle, and which i then insert into the "real" coordinates it would have, which is why i'm doing the ($bx*16)+$x stuff.
A diagram of the refresh loop is available here: http://dwarvis.googlecode.com/svn/trunk/lifevis/lifevis.dia
As for your suggestions: I can't even figure out what you're doing with the binary operations there and i'd like to strike a balance between readability and performance. PDL, i have never heard about that and from glancing at the linked page i also can't figure out how it would help me. Sorry if i'm being dumb here and thanks for trying.
| [reply] |
|
|
PDL, i have never heard about that and from glancing at the linked page i also can't figure out how it would help me.
You seem to be doing quite a lot of array calculations, and if I understand it correctly, that's what PDL (pdl.perl.org) is particularly good at:
PDL ("Perl Data Language") gives standard Perl the ability to compactly store and speedily manipulate the large N-dimensional data arrays.
Please do check out the demos, screenshots and success stories.
--
No matter how great and destructive your problems may seem now, remember, you've probably only seen the tip of them. [1]
| [reply] |
|
|
|
|
Re: How to speed up a nested loop?
by BrowserUk (Patriarch) on Sep 18, 2008 at 12:18 UTC
|
tiles[ 0][ 0][constant][type] = type_data[ 0]
tiles[ 1][ 0][constant][type] = type_data[ 16]
tiles[ 2][ 0][constant][type] = type_data[ 32]
tiles[ 3][ 0][constant][type] = type_data[ 48]
tiles[ 4][ 0][constant][type] = type_data[ 64]
tiles[ 5][ 0][constant][type] = type_data[ 80]
tiles[ 6][ 0][constant][type] = type_data[ 96]
tiles[ 7][ 0][constant][type] = type_data[112]
tiles[ 8][ 0][constant][type] = type_data[128]
tiles[ 9][ 0][constant][type] = type_data[144]
tiles[10][ 0][constant][type] = type_data[160]
tiles[11][ 0][constant][type] = type_data[176]
tiles[12][ 0][constant][type] = type_data[192]
tiles[13][ 0][constant][type] = type_data[208]
tiles[14][ 0][constant][type] = type_data[224]
tiles[15][ 0][constant][type] = type_data[240]
...
tiles[ 0][15][constant][type] = type_data[ 15]
tiles[ 1][15][constant][type] = type_data[ 31]
tiles[ 2][15][constant][type] = type_data[ 47]
tiles[ 3][15][constant][type] = type_data[ 63]
tiles[ 4][15][constant][type] = type_data[ 79]
tiles[ 5][15][constant][type] = type_data[ 95]
tiles[ 6][15][constant][type] = type_data[111]
tiles[ 7][15][constant][type] = type_data[127]
tiles[ 8][15][constant][type] = type_data[143]
tiles[ 9][15][constant][type] = type_data[159]
tiles[10][15][constant][type] = type_data[175]
tiles[11][15][constant][type] = type_data[191]
tiles[12][15][constant][type] = type_data[207]
tiles[13][15][constant][type] = type_data[223]
tiles[14][15][constant][type] = type_data[239]
tiles[15][15][constant][type] = type_data[255]
Which means that you are taking a one dimension array of 256 values, treating it as a 2D 16x16 array, and mapping vertical slices of that, onto horizontal slices of your display:
@type_data
[00,01,02,03]
[10,11,12,13]
[20,21,22,23]
[30,31,32,33]
@tiles
bx
|
+--+--+--+--+--+--+--+--+--+--+--
| | | | | | | | | | |
+--+--+--+--+--+--+--+--+--+--+--
| | | | | | | | | | |
+--+--+--+--+--+--+--+--+--+--+--
| | | | | | | | | | |
+--+--+--+--+--+--+--+--+--+--+--
| | | | | | | | | | |
by-+--+--+--+--+--+--+--+--+--+--+--
| | | | |00|10|20|30| | |
+--+--+--+--+--+--+--+--+--+--+--
| | | | |01|11|21|31| | |
+--+--+--+--+--+--+--+--+--+--+--
| | | | |02|12|22|32| | |
+--+--+--+--+--+--+--+--+--+--+--
| | | | |03|13|23|33| | |
+--+--+--+--+--+--+--+--+--+--+--
| | | | | | | | | | |
And it is actually much worse than that, because you have two extra layers of indirection at each of those points: $bz & type(???):
As there is no OpenGL premative that can use this 4-deep array structure directly, then at some point later you must be unpacking chunks of it to pass to OpenGL primitives.
At the very least, you should re-order your tiles structure to: $tiles[$bz][$by][$bx][type] which would allow for a far more direct mapping of the elements with far less indirect for any given value of $bz which appears to be a constant during these loops. If you could move the type constant lower in the structure, then you could do the assignments using array slices 16 values at a time.
Basically, to improve your performance much beyond where you are now, you need to seriously re-consider the way you are structuring your data. First the ordering to allow more direct mapping.
Secondly, the complexity. If type is a constant, why is it there? It seems like you're just overstructuring your data and paying for it in time.
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] [select] |
|
|
Thanks for visualizing that. I never thought of doing that and it helped a good bit to show me that i really need to reverse the tiles array.
As for OpenGL primitives, they won't be much help to me anyhow. Each tile can be a variety of things, either a full cube, a ramp, stairs, a door, a tree, etc. As such i'll need to construct objects for each tile anyhow, and what makes it worse: The exact visual of each tile is highly dependant on the tiles around it, for example in the case of ramps.
As for the type constant: Each tile actually has 3 datasets. The type, which consists of two bytes, and binary flags for designation and occupation, which are stored in 4 byte values. Right now i'm ignoring those, but later on i'll need to add them in, and i figured there won't be much of a difference between doing it as pseudo-hash built by using constants or by creating 3 arrays for each of these datasets.
"then you could do the assignments using array slices 16 values at a time."
I'm not 100% sure what you mean by this, but it denotes some kind of bulk operation, right? I'm not too sure that that would be very useful, given that i don't only need to transfer the data, but also need to check whether the input data is different than the already existing data.
This may help a bit in understanding it: http://i34.tinypic.com/20j2pep.jpg
Update: Argh, changing around got it to $tiles[$bz][type][$rx][$ry] now, but can't figure out a good way how to swap x and y, as that turns the cells by 90° and would need correction ... somewhere else too.
Update 2: I was being stupid and shouldn't have changed the order of parameters in the function that actually draws the tile. The code looks as follows now:
Next step will be to play around with getting the reference to either $tiles[$bz][type] or $tiles[$bz][type][$by] at the appropiate places and see if that gives some kind of speed-up.
Update 3: Damn, references actually give me a speed boost of 40% over what we already head. Right now i'm only wondering if there is a visually more pleasing way to dereference that.
| [reply] [d/l] [select] |
|
|
I'm not 100% sure what you mean by this, but it denotes some kind of bulk operation, right? I'm not too sure that that would be very useful, given that i don't only need to transfer the data, but also need to check whether the input data is different than the already existing data.
I assume that you are doing some kind of billboarding for the backdrops of your scenes. And you are attempting to optimise that by onlu updating the cells or slices if they have changed. But, often as not with these things, is frequently quicker to avoid the conditionals and simple update everything. When the screen is redrawn, because the player moved or rotated his viewpoint, everything needs to be scaled/slewed and redrawn anyway.
So, if you could do @{ $tiles[z][y] } = @tile_data[ $y*16 .. ($y+1) * 16 - 1 ];-- that is, assign a complete X-slice in one go--then you've both avoided 16*2 index calculations, but also 16 condition tests.
If you could arrange for the input data to be pre-formatted as a 2d array, then you can also avoid the slice operation and just assign pointers to arrays of 16 values, with a significant further saving.
I appreciate it might require a re-think on some of your logic to make use of this, and that may be further than you wish to go, but it can be effective.
It's similar to some descriptions of the Fischer-Yates shuffle, that test each pairing and avoid the swap if the values are the same. In most cases, it is cheaper to go ahead and swap them unconditionally, than to test the condition for every pairing.
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] |
|
|
Re: How to speed up a nested loop?
by apl (Monsignor) on Sep 17, 2008 at 22:28 UTC
|
You could always
foreach $tile_index ( 0 .. 255 )
and then calculate $rx and $ry... | [reply] [d/l] |
|
|
Wouldn't i have to get my $y value by dividing through 16 then? I'd figure that'd royally mess up the performance.
| [reply] |
|
|
| [reply] [d/l] |
|
|
|
|