How to speed up a nested loop?

Xenofur has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: How to speed up a nested loop? by GrandFather (Saint) on Sep 17, 2008 at 22:33 UTC
I can't imagine the following refactoring will make much difference, but it may help a little: my $bxScaled = $bx * 16; my $byScaled = $by * 16; for my $x (0 .. 15) { my $xScaled = 16 * $x; for my $y (0 .. 15) { # this calculates the tile index we are currently at, # using the x and y coords in this block my $tile_index = $y + $xScaled; # this calculates the real x and y values of this tile my $rx = $bxScaled + $x; my $ry = $byScaled + $y; # insert type data into the internal map array and # note that there was change in this block if applicable my $tile = $tiles[$rx][$ry][$bz][type]; if (!defined $tile \|\| $$tile != $type_data[$tile_index]) { $changed = 1; $$tile = $type_data[$tile_index]; } } } [download] Main changes are to reduce the number of multiplies performed in the inner loop and to reduce the number of array indexing operations. Update: A quick benchmark indicates about a 40% speed improvement. I'm surprised! Read more... Benchmark code and result (2 kB) Update: a further small improvement (about 10%) is realized by moving the $rx calculation to the outer loop. Adding `my $xTiles = $tiles[$rx]` to the outer loop and using my $xTiles in place of `$tiles[$rx]` in the inner loop gives another 2% improvement (although that's getting to be in the benchmark noise and may be bogus). Perl reduces RSI - it saves typing	[reply] [d/l] [select]
Re^2: How to speed up a nested loop? by tilly (Archbishop) on Sep 17, 2008 at 23:04 UTC
You can factor more work out of the inner loop, but I would expect the win to be very minor: my $bxScaled = $bx * 16; my $byScaled = $by * 16; for my $x (0 .. 15) { my $xScaled = 16 * $x; # These calculations were factored out of the inner loop my $rx = $bxScaled + $x; my $tiles_for_rx = $tiles[$rx]; for my $y (0 .. 15) { # this calculates the tile index we are currently at, # using the x and y coords in this block my $tile_index = $y + $xScaled; # this calculates the real y value of this tile my $ry = $byScaled + $y; # insert type data into the internal map array and # note that there was change in this block if applicable my $tile = $tiles_for_rx->[$ry][$bz][type]; if (!defined $tile \|\| $$tile != $type_data[$tile_index]) { $changed = 1; $$tile = $type_data[$tile_index]; } } } [download] Incidentally I'm puzzled at the array access of type. Should that be a $type? Or perhaps a hash access? Either way the current version looks like a bug. Incidentally another cause of your win over the original is that you're using lexical variables consistently, which I believe are slightly faster than package variables.	[reply] [d/l]
Re^3: How to speed up a nested loop? by GrandFather (Saint) on Sep 17, 2008 at 23:14 UTC
I assumed type was a constant. It may be a transcription error of some sort. I can't see it making any difference to the results of refactoring however. Don't we always use lexical variables and strictures? ;) The OP may be interested in reading The strictures, according to Seuss. Perl reduces RSI - it saves typing	[reply]
Re^4: How to speed up a nested loop? by Xenofur (Monk) on Sep 18, 2008 at 08:28 UTC
Re^3: How to speed up a nested loop? by Xenofur (Monk) on Sep 18, 2008 at 08:26 UTC
Your hint has some truth. I should probably refactor my tiles array so i can get a reference to $tiles$bz $rx and go on from there. However you should probably read up about references a bit, as your code would never work as it is there. :) As for type, that is indeed a constant. Hashes are too slow, thus i used constants to build pseudo-hashes.	[reply]
Re^4: How to speed up a nested loop? by tilly (Archbishop) on Sep 18, 2008 at 15:59 UTC
Re^5: How to speed up a nested loop? by Xenofur (Monk) on Sep 18, 2008 at 16:19 UTC
Some notes below your chosen depth have not been shown here
Re^2: How to speed up a nested loop? by juster (Friar) on Sep 18, 2008 at 05:47 UTC
I played with this for entirely too long and got about a 8% improvement on tilly's according to Benchmark (on linux with an old Pentium M). This was mostly by changing `$tile_index` to a counter instead of finding it by addition each time. I also made a lookup table for bx/byScaled but this did little. While doing this I got to wondering about the code snippet at the end: `my $tile = $tiles[$rx][$ry][$bz][type]; if (!defined $tile \|\| $$tile != $type_data[$tile_index]) { $changed = 1; $$tile = $type_data[$tile_index]; }` [download] which I don't understand. If `$tile` is undefined where does `$$tile` go to? If `$$tile` goes nowhere how come there's no warning or anything? I combined everyone's code together in one large benchmark: Read more... (5 kB) Thanks for the introduction to Benchmark and to the OP for an interesting node. Update: after looking at the code I posted I may have screwed up the order of comparisons and I'm too tired to fix it... but the point was... using an iterator. You could even have the x,y,z stored linearly in an array which would reduce the loops and have the same results. Then you could have an array of arrays (because you need a sub-array for the `type` element right?) and just iterate over with one loop instead of two. IE a matrix like this: `(1 2 3) (4 5 6) ===> ( 1 2 3 4 5 6 7 8 9 ) (7 8 9) how to do in three dimensions, I dunno...` [download]	[reply] [d/l] [select]
Re^3: How to speed up a nested loop? by Xenofur (Monk) on Sep 18, 2008 at 08:52 UTC
Tilly's code is missing a few \s and ${}s to be able to work. Also, thank you, the counter for the tile actually got it down to .046 over the .048 from GF. :) As for storing it linearly ... Honestly i don't think it's worth the effort. This stuff is already difficult enough to understand and keep in my mind as it is and i'm not that desperate for performance. ^^ If you're curious, here's a flow diagram. Update: These things actually reduced performance: `my $rx = $bxScaled[$x]; my $ry = $byScaled[$y];` [download] I guess the simple additions there are faster than array lookups.	[reply] [d/l]
Re^2: How to speed up a nested loop? by Xenofur (Monk) on Sep 18, 2008 at 08:10 UTC
With your changes i'm at the following code and have a real life application improvement from 0.055 seconds for a whole "syncToDF" loop down to 0.049, which is already pretty respectable. Additionally you've helped show me where i need to keep my eyes open. Thanks a lot! :D `my ($tile_index,$rx,$ry,$y,$x,$xScaled); my $bxScaled = $bx * 16; my $byScaled = $by * 16; for $x ( 0..15 ) { $rx = $bxScaled+$x; $xScaled = 16 * $x; for $y ( 0..15 ) { $tile_index = $y+$xScaled; $ry = $byScaled+$y; if ( !defined $tiles[$rx][$ry][$bz][type] \|\| $tiles[$rx][$ry][$bz][type] != $type_data[$tile_index] ) { $changed = 1; $tiles[$rx][$ry][$bz][type] = $type_data[$tile_index]; } } }` [download] Now to see whether there's some more good hints. :)	[reply] [d/l]
Re: How to speed up a nested loop? by ikegami (Patriarch) on Sep 18, 2008 at 06:30 UTC
You'd be better off if you'd rearrange your data structure so that you could use strings of 256 bytes (or whatever is big enough for what you are assigning) and string bit ops. `$changed = ~$defined \| ($type ^ $new); $type = $new; $defined = "\xFF" x 256;` [download] But since you since you provided no information about your data structure — What are $bz, tile and the data you are assining, and why do you add $bx+16 and $by+16 — can't help you beyond that. Update: Oops, the sample code I provided worked at the bit level, not the byte level. And I can't think of a good way to do it at the byte level at the moment. Why aren't you using PDL anyway?	[reply] [d/l]
Re^2: How to speed up a nested loop? by Xenofur (Monk) on Sep 18, 2008 at 09:26 UTC
Ok, short explanation about what's going on: There's this game that looks like this: http://www.tigsource.com/features/images/dwarffortress-big.png And i'm writing this to get something that looks like this: http://www.videogames.net.au/images/dwarf-fortress-3d-visualizer-beautiful-fortress1.jpg The game internally stores the data as 16x16 blocks of single tiles. My script grabs the current cursor position and then grabs the data around the cursor and converts it into a big unified multi-dimensional by extracting the tile type and storing it in said array. That array is then later used to render images in OpenGL. $bx and $by are the coordinates of each 16x16 block on the 2d plane, while $bz is the z dimension that's equal for tiles and blocks. I get the data for each block as an array of 256 values, through which i cycle, and which i then insert into the "real" coordinates it would have, which is why i'm doing the ($bx*16)+$x stuff. A diagram of the refresh loop is available here: http://dwarvis.googlecode.com/svn/trunk/lifevis/lifevis.dia As for your suggestions: I can't even figure out what you're doing with the binary operations there and i'd like to strike a balance between readability and performance. PDL, i have never heard about that and from glancing at the linked page i also can't figure out how it would help me. Sorry if i'm being dumb here and thanks for trying.	[reply]
Re^3: How to speed up a nested loop? by andreas1234567 (Vicar) on Sep 18, 2008 at 10:59 UTC
PDL, i have never heard about that and from glancing at the linked page i also can't figure out how it would help me. You seem to be doing quite a lot of array calculations, and if I understand it correctly, that's what PDL (pdl.perl.org) is particularly good at: PDL ("Perl Data Language") gives standard Perl the ability to compactly store and speedily manipulate the large N-dimensional data arrays. Please do check out the demos, screenshots and success stories. -- No matter how great and destructive your problems may seem now, remember, you've probably only seen the tip of them. [1]	[reply]
Re^4: How to speed up a nested loop? by etj (Priest) on Jun 06, 2022 at 00:53 UTC
Re^4: How to speed up a nested loop? by Xenofur (Monk) on Sep 18, 2008 at 18:32 UTC
Re: How to speed up a nested loop? by BrowserUk (Patriarch) on Sep 18, 2008 at 12:18 UTC
The index mappings produced (fixing $bx & $by as 0) are: tiles[ 0][ 0][constant][type] = type_data[ 0] tiles[ 1][ 0][constant][type] = type_data[ 16] tiles[ 2][ 0][constant][type] = type_data[ 32] tiles[ 3][ 0][constant][type] = type_data[ 48] tiles[ 4][ 0][constant][type] = type_data[ 64] tiles[ 5][ 0][constant][type] = type_data[ 80] tiles[ 6][ 0][constant][type] = type_data[ 96] tiles[ 7][ 0][constant][type] = type_data[112] tiles[ 8][ 0][constant][type] = type_data[128] tiles[ 9][ 0][constant][type] = type_data[144] tiles[10][ 0][constant][type] = type_data[160] tiles[11][ 0][constant][type] = type_data[176] tiles[12][ 0][constant][type] = type_data[192] tiles[13][ 0][constant][type] = type_data[208] tiles[14][ 0][constant][type] = type_data[224] tiles[15][ 0][constant][type] = type_data[240] ... tiles[ 0][15][constant][type] = type_data[ 15] tiles[ 1][15][constant][type] = type_data[ 31] tiles[ 2][15][constant][type] = type_data[ 47] tiles[ 3][15][constant][type] = type_data[ 63] tiles[ 4][15][constant][type] = type_data[ 79] tiles[ 5][15][constant][type] = type_data[ 95] tiles[ 6][15][constant][type] = type_data[111] tiles[ 7][15][constant][type] = type_data[127] tiles[ 8][15][constant][type] = type_data[143] tiles[ 9][15][constant][type] = type_data[159] tiles[10][15][constant][type] = type_data[175] tiles[11][15][constant][type] = type_data[191] tiles[12][15][constant][type] = type_data[207] tiles[13][15][constant][type] = type_data[223] tiles[14][15][constant][type] = type_data[239] tiles[15][15][constant][type] = type_data[255] [download] Which means that you are taking a one dimension array of 256 values, treating it as a 2D 16x16 array, and mapping vertical slices of that, onto horizontal slices of your display: @type_data [00,01,02,03] [10,11,12,13] [20,21,22,23] [30,31,32,33] @tiles bx \| +--+--+--+--+--+--+--+--+--+--+-- \| \| \| \| \| \| \| \| \| \| \| +--+--+--+--+--+--+--+--+--+--+-- \| \| \| \| \| \| \| \| \| \| \| +--+--+--+--+--+--+--+--+--+--+-- \| \| \| \| \| \| \| \| \| \| \| +--+--+--+--+--+--+--+--+--+--+-- \| \| \| \| \| \| \| \| \| \| \| by-+--+--+--+--+--+--+--+--+--+--+-- \| \| \| \| \|00\|10\|20\|30\| \| \| +--+--+--+--+--+--+--+--+--+--+-- \| \| \| \| \|01\|11\|21\|31\| \| \| +--+--+--+--+--+--+--+--+--+--+-- \| \| \| \| \|02\|12\|22\|32\| \| \| +--+--+--+--+--+--+--+--+--+--+-- \| \| \| \| \|03\|13\|23\|33\| \| \| +--+--+--+--+--+--+--+--+--+--+-- \| \| \| \| \| \| \| \| \| \| \| [download] And it is actually much worse than that, because you have two extra layers of indirection at each of those points: `$bz` & `type`(???): As there is no OpenGL premative that can use this 4-deep array structure directly, then at some point later you must be unpacking chunks of it to pass to OpenGL primitives. At the very least, you should re-order your tiles structure to: `$tiles[$bz][$by][$bx][type]` which would allow for a far more direct mapping of the elements with far less indirect for any given value of `$bz` which appears to be a constant during these loops. If you could move the type constant lower in the structure, then you could do the assignments using array slices 16 values at a time. Basically, to improve your performance much beyond where you are now, you need to seriously re-consider the way you are structuring your data. First the ordering to allow more direct mapping. Secondly, the complexity. If `type` is a constant, why is it there? It seems like you're just overstructuring your data and paying for it in time. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply] [d/l] [select]
Re^2: How to speed up a nested loop? by Xenofur (Monk) on Sep 18, 2008 at 14:00 UTC
Thanks for visualizing that. I never thought of doing that and it helped a good bit to show me that i really need to reverse the tiles array. As for OpenGL primitives, they won't be much help to me anyhow. Each tile can be a variety of things, either a full cube, a ramp, stairs, a door, a tree, etc. As such i'll need to construct objects for each tile anyhow, and what makes it worse: The exact visual of each tile is highly dependant on the tiles around it, for example in the case of ramps. As for the type constant: Each tile actually has 3 datasets. The type, which consists of two bytes, and binary flags for designation and occupation, which are stored in 4 byte values. Right now i'm ignoring those, but later on i'll need to add them in, and i figured there won't be much of a difference between doing it as pseudo-hash built by using constants or by creating 3 arrays for each of these datasets. "then you could do the assignments using array slices 16 values at a time." I'm not 100% sure what you mean by this, but it denotes some kind of bulk operation, right? I'm not too sure that that would be very useful, given that i don't only need to transfer the data, but also need to check whether the input data is different than the already existing data. This may help a bit in understanding it: http://i34.tinypic.com/20j2pep.jpg Update: Argh, changing around got it to `$tiles[$bz][type][$rx][$ry]` now, but can't figure out a good way how to swap x and y, as that turns the cells by 90� and would need correction ... somewhere else too. Update 2: I was being stupid and shouldn't have changed the order of parameters in the function that actually draws the tile. The code looks as follows now: Read more... (907 Bytes) Next step will be to play around with getting the reference to either `$tiles[$bz][type]` or `$tiles[$bz][type][$by]` at the appropiate places and see if that gives some kind of speed-up. Update 3: Damn, references actually give me a speed boost of 40% over what we already head. Right now i'm only wondering if there is a visually more pleasing way to dereference that. Read more... (894 Bytes)	[reply] [d/l] [select]
Re^3: How to speed up a nested loop? by BrowserUk (Patriarch) on Sep 18, 2008 at 14:28 UTC
I'm not 100% sure what you mean by this, but it denotes some kind of bulk operation, right? I'm not too sure that that would be very useful, given that i don't only need to transfer the data, but also need to check whether the input data is different than the already existing data. I assume that you are doing some kind of billboarding for the backdrops of your scenes. And you are attempting to optimise that by onlu updating the cells or slices if they have changed. But, often as not with these things, is frequently quicker to avoid the conditionals and simple update everything. When the screen is redrawn, because the player moved or rotated his viewpoint, everything needs to be scaled/slewed and redrawn anyway. So, if you could do `@{ $tiles[z][y] } = @tile_data[ $y16 .. ($y+1) 16 - 1 ];`-- that is, assign a complete X-slice in one go--then you've both avoided 16*2 index calculations, but also 16 condition tests. If you could arrange for the input data to be pre-formatted as a 2d array, then you can also avoid the slice operation and just assign pointers to arrays of 16 values, with a significant further saving. I appreciate it might require a re-think on some of your logic to make use of this, and that may be further than you wish to go, but it can be effective. It's similar to some descriptions of the Fischer-Yates shuffle, that test each pairing and avoid the swap if the values are the same. In most cases, it is cheaper to go ahead and swap them unconditionally, than to test the condition for every pairing. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply] [d/l]
Re^4: How to speed up a nested loop? by Xenofur (Monk) on Sep 18, 2008 at 14:49 UTC
Re: How to speed up a nested loop? by apl (Monsignor) on Sep 17, 2008 at 22:28 UTC
You could always `foreach $tile_index ( 0 .. 255 )` and then calculate $rx and $ry...	[reply] [d/l]
Re^2: How to speed up a nested loop? by Xenofur (Monk) on Sep 18, 2008 at 08:02 UTC
Wouldn't i have to get my $y value by dividing through 16 then? I'd figure that'd royally mess up the performance.	[reply]
Re^3: How to speed up a nested loop? by apl (Monsignor) on Sep 18, 2008 at 09:44 UTC
16 == 2^4, so I'd use `$ry = $tile_index >> 4;` Bit-shifts are a lot less expensive than divison.	[reply] [d/l]
Re^4: How to speed up a nested loop? by GrandFather (Saint) on Sep 18, 2008 at 09:54 UTC
Re^4: How to speed up a nested loop? by Xenofur (Monk) on Sep 18, 2008 at 10:21 UTC