in reply to Re: How to speed up a nested loop?
in thread How to speed up a nested loop?

Thanks for visualizing that. I never thought of doing that and it helped a good bit to show me that i really need to reverse the tiles array.

As for OpenGL primitives, they won't be much help to me anyhow. Each tile can be a variety of things, either a full cube, a ramp, stairs, a door, a tree, etc. As such i'll need to construct objects for each tile anyhow, and what makes it worse: The exact visual of each tile is highly dependant on the tiles around it, for example in the case of ramps.

As for the type constant: Each tile actually has 3 datasets. The type, which consists of two bytes, and binary flags for designation and occupation, which are stored in 4 byte values. Right now i'm ignoring those, but later on i'll need to add them in, and i figured there won't be much of a difference between doing it as pseudo-hash built by using constants or by creating 3 arrays for each of these datasets.

"then you could do the assignments using array slices 16 values at a time."
I'm not 100% sure what you mean by this, but it denotes some kind of bulk operation, right? I'm not too sure that that would be very useful, given that i don't only need to transfer the data, but also need to check whether the input data is different than the already existing data.

This may help a bit in understanding it: http://i34.tinypic.com/20j2pep.jpg

Update: Argh, changing around got it to $tiles[$bz][type][$rx][$ry] now, but can't figure out a good way how to swap x and y, as that turns the cells by 90° and would need correction ... somewhere else too.

Update 2: I was being stupid and shouldn't have changed the order of parameters in the function that actually draws the tile. The code looks as follows now:
my ($rx,$ry,$y,$x,$xScaled); my $bxScaled = $bx * 16; my $byScaled = $by * 16; my $tile_index=0; my (@realx,@realy); for $x ( 0..15 ) { $rx = $bxScaled+$x; for $y ( 0..15 ) { $ry = $byScaled+$y; if ( !defined $tiles[$bz][type][$ry][$rx] || $tiles[$bz][type][$ry][$rx] != $type_data[$tile_index] ) { $changed = 1; $tiles[$bz][type][$ry][$rx] = $type_data[$tile_index]; } ++$tile_index; } }
Next step will be to play around with getting the reference to either $tiles[$bz][type] or $tiles[$bz][type][$by] at the appropiate places and see if that gives some kind of speed-up.

Update 3: Damn, references actually give me a speed boost of 40% over what we already head. Right now i'm only wondering if there is a visually more pleasing way to dereference that.
my ($rx,$ry,$y,$x,$xScaled); my $bxScaled = $bx * 16; my $byScaled = $by * 16; my $tile_index=0; my (@realx,@realy); my $tile = \$tiles[$bz][type]; for $x ( 0..15 ) { $rx = $bxScaled+$x; for $y ( 0..15 ) { $ry = $byScaled+$y; if ( !defined $$tile->[$ry][$rx] || $$tile->[$ry][$rx] != $type_data[$tile_index] ) { $changed = 1; $$tile->[$ry][$rx] = $type_data[$tile_index]; } ++$tile_index; } }

Replies are listed 'Best First'.
Re^3: How to speed up a nested loop?
by BrowserUk (Patriarch) on Sep 18, 2008 at 14:28 UTC
    I'm not 100% sure what you mean by this, but it denotes some kind of bulk operation, right? I'm not too sure that that would be very useful, given that i don't only need to transfer the data, but also need to check whether the input data is different than the already existing data.

    I assume that you are doing some kind of billboarding for the backdrops of your scenes. And you are attempting to optimise that by onlu updating the cells or slices if they have changed. But, often as not with these things, is frequently quicker to avoid the conditionals and simple update everything. When the screen is redrawn, because the player moved or rotated his viewpoint, everything needs to be scaled/slewed and redrawn anyway.

    So, if you could do @{ $tiles[z][y] } = @tile_data[ $y*16 .. ($y+1) * 16 - 1 ];-- that is, assign a complete X-slice in one go--then you've both avoided 16*2 index calculations, but also 16 condition tests.

    If you could arrange for the input data to be pre-formatted as a 2d array, then you can also avoid the slice operation and just assign pointers to arrays of 16 values, with a significant further saving.

    I appreciate it might require a re-think on some of your logic to make use of this, and that may be further than you wish to go, but it can be effective.

    It's similar to some descriptions of the Fischer-Yates shuffle, that test each pairing and avoid the swap if the values are the same. In most cases, it is cheaper to go ahead and swap them unconditionally, than to test the condition for every pairing.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      Ah, i see where you're coming from. I need to explain however that that the analysis for change is necessary. You see, in order to actually have things be decently fast in OpenGL, i compile the contents of one 16x16 block into one display list, that can be called as one single line in perl and gets drawn directly from gfx card ram. Without those this would be pretty much impossible.

      However the act of actually constructing those is pretty costly timewise. To give a comparison, the following is from one refresh cycle where ALL display lists (98 in this instance) get generated:
      the code took: 0.467388 wallclock secs ( 0.47 usr +  0.06 sys =  0.53 CPU)
      And this one is from when NO display lists are generated:
      the code took: 0.025486 wallclock secs ( 0.03 usr +  0.00 sys =  0.03 CPU)
      As such, making the assignments would shave time off of an "idle refresh", but greatly increase the general costs by increasing the number of "rebuild refreshes".

      The increases i'm looking for right now are necessary to ensure the ability to refresh at a constant rate while no game changes are occuring, without having any skipping in the operation. This goal is pretty much reached, as i can now fix the FPS at 30 and updates should fit inbetween frame generation.