in reply to Refactoring nested if block

It's hazardous to your health to refactor code without test data, but I've asked a similar question in the past, so here's my attempt.

I'm not sure about the math in the condition if( @cur_spin == ( $row_length + 15 ) ) {, which I why I left it there rather than folding it into the constant at the top. It seems to me that every time you use the $row_length constant, you then add 15 more items to @cur_spin anyway. So factoring that out, by adding 15 to the constant and always add the 15 items reduces the clutter--but I'd need to test it to convince myself that I did it correctly.

Copying all the data to a temp array and then having to reference the slice [ 3.. 17 ] in 3 different places seems clumsy. If you need to use the temp array, better to only copy the slice you are going to need, and then use it all each time. Of course that would translate the magic [3] into the magic [ 0 ].

Two locally scoped variables called $major_frame_index & $minor_frame_index represent multiple instances of 12 redundant characters, long, ugly lines and no gain in clarity.

You seem to be uncomfortable with using @array in a scalar context as the size of the array, which is quite common, but how could the size of the array ever be negative in my $cur_spin_cnt = (scalar @cur_spin > 0) ? scalar @cur_spin : 0?

Update: Added the missing first line that went walkabout when C&Ping.

sub WriteSpinData { my( $OUT_FH, $ref ) = @_; my( $last_major, $line, @cur_spin ); my( $row_length ) = 15*160+1; for my $filename ( sort keys %{ $ref } ) { for my $major ( sort keys %{ $ref->{$filename} } ) { for my $minor ( grep{/^\d+$/} keys %{ $ref->{$filename}{$m +ajor} } ){ @cur_spin = "$filename,$major,$minor" if $ref->{$filename}{$major}{$minor}[ 3 ] == 165 and ( $minor % 10 ) == 0; push @cur_spin, @{ $ref->{$filename}{$major}{$minor} } +[ 3..17 ]; if( @cur_spin == ( $row_length + 15 ) ) { print $OUT_FH join( ',', @cur_spin ), "\n"; @cur_spin = (); } } $last_major = $major; } printf "%s: %d major frames\n", basename( $filename ), $last_m +ajor; } }

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Replies are listed 'Best First'.
Re^2: Refactoring nested if block
by bowei_99 (Friar) on Mar 25, 2006 at 09:14 UTC
    Hi all,

    I really appreciate all of your input, especially BrowserUK. I'll try those changes, and see how it works. I agree - testing is critical.

    One thing I figure I should describe is my naming convention. Yes, I know $major_frame_index and $minor_frame_index are long, but my reasoning was that $major or even $major_frame didn't seem descriptive enough for me. See, this is part of a larger app, in which I parse a large chunk of binary data (major frame), which is in turn made up of smaller chunks of data (minor frames). This resulted in two types of data for each type of frame, a numerical index and a chunk of binary data. The indices I use to reference their locations I called $major_frame_index and $minor_frame_index, and the chunks of data I have as scalar variables, $minor_frame and $major_frame. But writing this now makes me rethink my need to have store the binary chunks in variables... I don't have access to the rest of the code now, so can't say for sure.

    But, assuming I would still need both the index and the binary chunk, is there a better way to name the variables?

    -- Burvil

      If the major and minor frame indices are integers, are they sequential and could they be stored in array references rather than hash references? This might allow you to shorten/tighten things a bit -- for example, eliminating all the numerical sorting of keys and integer checks.

      I'd suggest pulling all your hard-coded numbers into variables that explain them. (E.g. why just minor frame data for 3 .. 17, why all the special logic around minor frame index 3 being 165.)

      Also, while "golfing" (reducing characters) can often make things more confusing, in this case, I think trying to strip out some of the "line-noise" of Perl so that the logic is clearer might help. That would include changing %{ $ref_data } to just %$ref_data and $ref_data->{$filename}->{$major_frame_index}->{$minor_frame_­index} to $ref_data->{$filename}{$major_frame_index}{$minor_frame_­index} if not just $ref_data->{$filename}{$i}{$j} (simple variables for loop iterators). You could also break those up into temporary variables for each loop so you're not cluttering up the screen with the long names.

      You might consider moving the logic when you're at row length or you hit your special frame data 3 equals 165 actions to separate subroutines with descriptive names. That would strip out lines to make the logic clearer (at the cost of some efficiency, true) as it brings your alternate conditions closer together visually.

      Something that jumps out is that you're pushing an array slice onto @cur_spin but checking to see if the length of @cur_spin ever hits the row length exactly. What's implied is that the row length is a multiple of the slice length but that's not immediately obvious. And what happens if you get to the end of a major frame and came up short? It doesn't look like anything would be written.

      You also don't seem to sort the keys of the minor frames when you loop through them. Is that a bug? Since just about every hash you have you wind up wanting sorted, you might just want to use a sorted hash in the first place. (Though I haven't used it, CPAN search found Tie::Sorted::Hash.)

      Having looked through your code a few times now, it looks like your special logic for @cur_spin equals zero is trying to pick off the name for the row and you need that logic since you reuse @cur_spin as your accumulator and don't want to try to generate a name each time through.

      That makes me think you should consider just assembling all of the minor frames up front and just splice them off a row at at time rather than looping over them. For example, I'd approach it more like this:

      sub WriteSpinData { my ($OUT_FH, $ref_data) = @_; my $frames_per_row = 65; for my $filename (sort keys %$ref_data ) { my $basename = basename $filename; # create a list of major frames for the file my @majors = map { $ref_data->{$filename}{$_} } sort { $a <=> $b } keys %{ ref_data->{$filename} }; for my $major ( @majors ) { # join all minor frames together my @minors = map { $major->{$_} } sort { $a <=> $b } # assuming sorted grep { /\d+/ } keys %$major; # print them a row at a time while ( my @row = splice( @minors, 0, $frames_per_row ) ) { my $row_name = find_row_name( @row ); my @data = map { @{$_}[3..17] } @row; print $OUT_FH join( q{,}, $row_name, @data ), "\n"; } } print "$basename: " . @majors . " major frames.\n"; } }

      This kind of assumes that you can generate a row name in a subroutine from the row itself other than with all those indexes. If you really want/need the indexes, then you'll have to do something more like what you've got with all the indices and looping instead of the way I've done it with map. I don't really have enough context for your application to say.

      -xdg

      Code written by xdg and posted on PerlMonks is public domain. It is provided as is with no warranties, express or implied, of any kind. Posted code may not have been tested. Use of posted code is at your own risk.

      But, assuming I would still need both the index and the binary chunk, is there a better way to name the variables?

      Well, I can't see the rest of the code, but to my mind that's the point. One of the great advantages of partitioning your code into self-contained subroutines and using locally scoped variables is that you do not have to consider the whole application when naming your variables, as you would if using large scopes or global vars.

      In the case of your two loop iterators, at the level of this subroutine, that is all they are: Loop interators. In fact at this level, the use of the word 'index' is a misnomer as you index into arrays, but you key into hashes. So I suppose that $major_key and $minor_key might make sense, but then it is fairly obvious from their use that they are keys. And their scope only covers a dozen or so lines, so it is unlikely that you, (or anyone working on that subroutine), will forget or loose sight of that.

      Outside of that subroutine, the keys those loop interators transiently hold, may have associations with "frames", (whatever they are :), and it might make sense to name variables that serve a similar purpose, in other parts of the code to reflect that, but within this subroutine they just serve to iterate the keys of the hash so referencing "frame" in the names serves little purpose. Even the references to "major" and "minor" are not especially relevant at this level, though the implication that one is subordinate to the other is accurate and fits their use, and is no worse than $key1 & $key2 or even $k1 & $k2. Though I would say that they are no better than those either.

      I think where you are coming from, is the school of thought that says you use a single, consistant name for any given entity throughout your program. I've seen this proposed in several books, and was even taught it a long time ago on a Fortran77 course. I cannot remember the scoping rules for Fortran77, but I recall having to use a lot of global vars, so consistant naming was a requirement. Such naming conventions were also a necessity in (old) COBOL programs with their huge DATA-SECTIONs at the top of the programs.

      If you are going to do this is Perl, especially in larger, more complex programs, you are throwing away one of the main benefits of local scoping. You (almost) might as well stick with global vars and have done with it.

      I guess if your local naming conventions dictate this practice, or if this is a large, existing application that uses similar conventions consistantly throughout the rest of the code, then sticking with it when refactoring this sub might make sense. In the long term, recognising that regardless of what the entities that these loop variables hold, are in the wider context, within the scope of this subroutine, and these variable's lives, they are simply loop / hash iterators, will allow you to write more concise lines of code and so concentrate soley upon what these variables are doing within this subroutine and treat the data coming into the subroutine in a generic fashion.

      In the end, that's a personal view and your decisions will be influenced by the wider context of your codebase and your personal or corporate preferences.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        Well, I can't see the rest of the code, but to my mind that's the point. One of the great advantages of partitioning your code into self-contained subroutines and using locally scoped variables is that you do not have to consider the whole application when naming your variables, as you would if using large scopes or global vars.

        Forgive me for disagreeing, but lets say I use $network_device_polling_data in five subroutines and $ndpdata in the sixth all to represent the same thing then that just introduces confusion and increase the liklihood of introducing a bug in the future. Subroutines and the contents of subroutines do not exist within a vacuum.