Can you help profile this?

Fighter2 has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Can you help profile this? by Tanktalus (Canon) on Dec 08, 2011 at 23:34 UTC
First thing's first. If you have sets embedded in your list, keep them as sets. What you're doing is nearly the same thing as pointer arithmetic in C, and is bad for many of the same reasons. ww mentioned an AoA, but as a question. I'll make it a statement: it should be an array of references. Personally, I'm a bit more partial to AoH, as it allows me to use names for keys to make sense of the inner data. An AoA would look like: `@in = ( [ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14 ], [ 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34 ], # ... );` [download] but we still don't know what any of the numbers mean without more context. An AoH usually would take up a lot more memory, but would give the context implicitly: `@in = ( { user => 'Fighter2', id => 936570, xp => 34, writeups => 11, ... }, );` [download] You can get hashes to take a lot less memory if you can define "default" values that cover most cases, reducing most of the hashes down to only unique values, at a slight cost to code complexity. I'll assume that's not an option (it rarely is, in my experience). Once you do this, you would have `$in[$i][12]` or `$in[$i]{attn}` instead of `$in[$i+12]` Your outer for loop would be a simple `for my $i (0..$#in)` (you would want to go through the indexes, and the last one is the last index, which is what $# gives you). You wouldn't be creating an @Attn the length of your original array, but 1/14th the size, which would help both memory usage and speed. Also, your inner arrays would be arbitrary lengths - so if you regularly have less than 13 items in a set, you simply would not allocate the extra memory. And, should you need more than 14 items in a set, you can now handle that, kind of. Here's a first crack: #!/usr/bin/perl #Initialization my @in=( [...], [...], ); my $MaxAgc0 = "1.15503129132678e-005"; # MaxAgc0 is a constant #Interpolation if (@in) { for my $i (0..$#in) { my $ptr = $in[$i]; # convenience. # we don't need the first item to be a count of sectors anymor +e. # we also don't need this loop at all. #for my $j (0..$#$ptr) #{ # so we also reduce the numbers here. my $num = $ptr->[11] * $MaxAgc0; push @Attn, $num / $ptr->[9]; #} } for my $i (0..$#in) { push @powers, ($in[$i][$_] - $Attn[$i])/100 for (1..8); } } else { # @in is empty print "\n No data to map for Panda machine. Check if input logfile + is empty"; kill; } [download] Now, what I'd do from here is pull apart the sets even more. Say something like this: `( { attn => 5, # attenuation? agc => 6, # the MaxAgc0 multiplier sectors => [ all the sectors for this data point in this array ], # etc. }, # repeat for each item. )` [download] I'd also change the output to match everything done above: return a list of arrays: `#!/usr/bin/perl #Initialization my @in=( {...}, {...}, ); my $MaxAgc0 = "1.15503129132678e-005"; # MaxAgc0 is a constant use List::MoreUtils qw(pairwise); #Interpolation if (@in) { my @Attn = map { my $num = $_->{agc} * $MaxAgc0; $num / $_->{attn}; } @in; @powers = pairwise { [ map { ( $a->{sectors}[$_] - $b ) / 100 } for @{$a->{sectors} +} ] } @in, @Attn; } else { # @in is empty print "\n No data to map for Panda machine. Check if input logfile + is empty"; kill; }` [download] Now the return will keep the sectors grouped, there will be no extra data in the output. We can still improve this by getting rid of @Attn altogether: `#Initialization my @in=( {...}, {...}, ); my $MaxAgc0 = "1.15503129132678e-005"; # MaxAgc0 is a constant #Interpolation if (@in) { @powers = map { my $attn = $_->{agc} * $MaxAgc0 / $_->{attn}; [ map { ( $_->{sectors}[$_] - $attn ) / 100 } for @{$_->{secto +rs}} ] } @in; } else { # @in is empty print "\n No data to map for Panda machine. Check if input logfile + is empty"; kill; }` [download] We got rid of a huge amount of wasted memory. That @Attn list was humongous in your code, and we've pared it down to 1/14th and now to a single scalar (which we calculate as we need it, then discard it once no longer needed). And we got rid of unnecessary items in your sets, though, in the interests of readability, I increased the memory usage of the input by making it an AoH's. That's probably a wash, if not slightly bigger now than it was. Hope that helps.	[reply] [d/l] [select]
Re: Can you help profile this? by Eliya (Vicar) on Dec 08, 2011 at 20:58 UTC
`for(my $j=1;$j<=$in[$i];$j++){ #$j is the actual no. of sector +s. my $num=($in[$i+12]*$MaxAgc0); push @Attn, $num/$in[$i+10]; }` [download] It's not clear to me what the `@Attn` array is supposed to be used for. As the number of iterations of the loop depends on the data (i.e. `$in[$i]` — which you haven't specified), it's not clear how many items are actually pushed onto the array. Also, all values pushed in one run of the loop seem to be same, because `$j` is never used in the computation. And then later, you index `@Attn` with multiples of 14 ... Questions over questions — at least for me :) Could you elaborate a bit more?	[reply] [d/l] [select]
Re: Can you help profile this? by ww (Archbishop) on Dec 08, 2011 at 22:21 UTC
First, neither your question nor your code makes it clear (to me, anyway; YMMV) precisely what you're trying to do; what the result of all this processing is supposed to provide, other than "values in @powers." But, in hopes your answers may shed some light on the matter: In your line 7, did you intend for `$i < @in;` to be `$i < $#in;` ? ... or, maybe `$i < ( $#in + 1 );`? `@in` is the entire array; not a count of elements See replies :-( And does this -- in any way -- resemble what you want? #!/usr/bin/perl use Modern::Perl; use Data::Dumper; # 942489 my @in= qw/1 2 3 4 5 6 7 8 9 10 11 1.155031267e-5 13 14 101 102 103 104 105 106 107 108 109 110 111 1.13 113 114 181 182 183 184 185 186 187 188 189 190 191 0.192 193 194/; my $MaxAgc0 = "1.15503129132678e-005"; # MaxAgc0 is a consta +nt my @Attn; my @powers; for ( my $i = 0; $i < $#in; $i += 14 ) { for ( my $j = 1; $j <= $in[$i]; $j++ ){ # $j is WHAT ? my $num = ( $in[$i+12]$MaxAgc0 ); push @Attn, $num / $in[$i + 10]; } } for (my $i = 0; $i < $#in; $i += 14) { push @powers, ( $in[$i + $_] - $Attn[$i] ) / 100 for (2..9); } say "\@Attn next" . "-" x20; print Dumper @Attn; say "\@powers next" . "~" x20; print Dumper @powers; [download] Output @Attn next-------------------- $VAR1 = '1.36503698065892e-005'; $VAR2 = '1.17584266594528e-005'; $VAR3 = '1.17584266594528e-005'; $VAR4 = '1.17584266594528e-005'; $VAR5 = '1.17584266594528e-005'; $VAR6 = '1.17584266594528e-005'; $VAR7 = '1.17584266594528e-005'; ... $VAR101 = '1.17584266594528e-005'; $VAR102 = '1.17584266594528e-005'; $VAR103 = '1.16712585982235e-005'; # value changes $VAR104 = '1.16712585982235e-005'; $VAR105 = '1.16712585982235e-005'; ... $VAR282 = '1.16712585982235e-005'; $VAR283 = '1.16712585982235e-005'; @powers next~~~~~~~~~~~~~~~~~~~~ $VAR1 = '0.0299998634963019'; $VAR2 = '0.0399998634963019'; $VAR3 = '0.0499998634963019'; $VAR4 = '0.0599998634963019'; $VAR5 = '0.0699998634963019'; $VAR6 = '0.0799998634963019'; $VAR7 = '0.0899998634963019'; $VAR8 = '0.0999998634963019'; $VAR9 = '1.02999988241573'; $VAR10 = '1.03999988241573'; $VAR11 = '1.04999988241573'; $VAR12 = '1.05999988241573'; $VAR13 = '1.06999988241573'; $VAR14 = '1.07999988241573'; $VAR15 = '1.08999988241573'; $VAR16 = '1.09999988241573'; $VAR17 = '1.82999988241573'; $VAR18 = '1.83999988241573'; $VAR19 = '1.84999988241573'; $VAR20 = '1.85999988241573'; $VAR21 = '1.86999988241573'; $VAR22 = '1.87999988241573'; $VAR23 = '1.88999988241573'; $VAR24 = '1.89999988241573'; [download] WAG: Given your reference to a "set", could it be that `@in` is supposed to be an AoA? And, given the recommendation that SOPW include a minimal* sample of code and data which reproduce the problem, why bother with the `else` clause when @in is (pseudo-)instantiated at your line 2?	[reply] [d/l] [select]
Re^2: Can you help profile this? by BrowserUk (Patriarch) on Dec 08, 2011 at 22:32 UTC
In your line 7, did you intend for $i < @in; to be $i < $#in; ? ... or, maybe $i < ( $#in + 1 );? @in is the entire array; not a count of elements How does someone who's been around here as long as you, miss the fact that an array reference `@a` in a scalar context (as provided by a comparison operator), renders the number of elements in the array. Ie. exactly the same number as `$#a + 1`? With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. The start of some sanity?	[reply] [d/l] [select]
Re^3: Can you help profile this? by ww (Archbishop) on Dec 08, 2011 at 22:39 UTC
Is that a trick question? ... and I don't even know whether to follow that with a smiley or a frown. So I guess the best sequel is to say "My bad; apologies!." And likewise to Riales for Re^2: Can you help profile this? ++, both, and - -, /me.	[reply]
Re^2: Can you help profile this? by Riales (Hermit) on Dec 08, 2011 at 22:34 UTC
Regarding the bit about line 7: An array evaluated in scalar format returns the length of the array. So @in would indeed have returned a count of its elements. Also, $#in would return the last index of @in. This would mean that the middle expression in your for loops to: `$i <= $#in`.	[reply] [d/l]
Re: Can you help profile this? by Riales (Hermit) on Dec 08, 2011 at 21:48 UTC
The immediate thing I notice is that @Attn is way larger than it needs to be. It looks like you determine a value for each element in every 14-element slice of @in. Instead of storing that value 14 times, just store it once, and have each element of the 14-element slice of @in look to that one element. This is untested, but... `for (my $i = 0; $i < @in; $i += 14) { my $num = $in[$in[$i+12] * $MaxAgc0; push @Attn, $num / $in[$i+10]; } for (my $i = 0; $i < @in; $i += 14) { foreach (2 .. 9) { push @powers, ($in[$i + $_] - $Attn[$i/14]) / 100; } }` [download] Actually, now that I wrote that out, I'm thinking you don't even need to build @Attn (again, untested): `for (my $i = 0; $i < @in; $i += 14) { my $num = $in[$in[$i+12] * $MaxAgc0; $num = $num / $in[$i+10]; foreach (2 .. 9) { push @powers, ($in[$i + $_] - $num) / 100; } }` [download] EDIT: All the above code may be wrong. I misread what the j-loop was doing in the original code; I had thought you were comparing to a slice of @in for some reason, but that's apparently not the case. I'm confused though...does the input data provide some sort of meta-information that you're reading in at every 14th element? My initial thought was that @Attn and @in were meant to be parallel arrays, but with the limit of the j-loop being defined that way, this would probably not be true...	[reply] [d/l] [select]