vr has asked for the wisdom of the Perl Monks concerning the following question:

I'm asking for advice on style, rather than how to solve a problem. How a professional would do it? What's readable to others -- and myself, in a year? So, all answers are maybe just opinion based and are welcome :-).

There's usual complex data structure AoHoAH... , not very deep. We can assume root is

$root = { value => [ ... ]};

Two types of array elements are "interesting":

{ type => 'items', value => [ ... ]} # and other key-value pairs { type => 'foo', value => { ... }} # and other key-value pairs

"Items" hash "points to" another array and is to be processed recursively. "Foo" hash is to be checked and may be replaced (split) with several hashes. Other "uninteresting" elements can be anything (but always hashrefs) and are to be left "as is".

The next sub is to be called as foobar( $root ), and hopefully will do everything right:

sub foobar { my $href = shift; $href-> { value } = [ map { foobar( $_ ) if $_-> { type } eq 'items'; $_-> { type } eq 'foo' ? SPLIT_FOO( $_ ) : $_ } @{ $href-> { value }}]; }

Or even shorter, moving a check out of sight:

sub foobaz { my $href = shift; $href-> { value } = [ map { foobar( $_ ) if $_-> { type } eq 'items'; SPLIT_IF_FOO( $_ ) } @{ $href-> { value }}]; }

The next sub has slightly less "punctuation noise" and skips assignment, and is to be called fooqux( $root-> { value }), i.e. now arrayref can serve as "root" instead of structure above.

sub fooqux { $aref = shift; splice @$aref, 0, @$aref, map { fooqux( $_-> { value }) if $_-> { type } eq 'items'; SPLIT_IF_FOO( $_ ) } @$aref }

What I'm not sure about all these, is they always rebuild any array within original structure, even if there's no "foo" elements at all. With condition checking moved to SPLIT, the SPLIT_IF_FOO is always called for all elements, regardless. Next sub prevents these shortcomings, but uses Perl's "no-no" -- the C-style loop.

sub fooquux { $aref = shift; for ( my $i = 0; $i < @$aref; $i ++ ) { my $item = $aref-> [ $i ]; fooquux( $item-> { value }) if $item-> { type } eq 'items'; next unless $item-> { type } eq 'foo'; my @list = SPLIT_FOO( $item ); splice @$aref, $i, 1, @list; $i += $#list } }

And, not sure what's worse, to splice potentially many times or to always replace with new array. Or maybe there's "best" idiomatic way to solve such problem?

Replies are listed 'Best First'.
Re: To splice or not to splice? What implementation is more readable, efficient, etc.?
by kcott (Archbishop) on Feb 18, 2017 at 22:10 UTC

    G'day vr,

    "I'm asking for advice on style, rather than how to solve a problem. How a professional would do it? What's readable to others -- and myself, in a year? So, all answers are maybe just opinion based and are welcome :-)."

    [That seems to somewhat contradict the title of your post. My response is based on what I've quoted from the body of your post.]

    The first thing I suggest you do is read "perlstyle - Perl style guide". Everyone has their own preferred style, and that's fine; however, regardless of the style you choose, the ultimate goal is code, written in a consistent manner, that's easy to read.

    Next, I'd recommend reading "perldsc - Perl Data Structures Cookbook". While I can generally work around the odd typo, there's so many inconsistencies here that I'm left wondering if you really understand the subject matter.

    You start by saying you have an "AoHoAH": that's not a real thing; what's the "AH" part? Although I can assume a typo, I don't know whether there's an extra "A" (AoHoH); and extra "H" (AoHoA); or a missing "o" (AoHoAoH). None of that really matters though: on the very next line you present a HoA as the root structure!

    You should be able to represent the structure (of whatever you really have) in such a way that it stands by itself without requiring a prosaic description. Take a look at the output of Data::Dumper or Data::Dump to see how they do it.

    Hash elements are key-value pairs. There's three instances where you've named the key "value". That, in itself, is likely to cause confusion. You've exacerbated the situation by using the same key name (i.e. "value") in different places in the structure hierarchy and then using different datatypes for the values (that's the values of the keys, not the names of the keys). I'm getting confused just talking about it.

    You talked about '"Items" hash': I see no code resembling "Items => {...}". There is a key called "type" with the value 'items': there's another associated key (called "value") whose value is an arrayref. The confusion just mounts.

    You haven't used meaningful names. You've used almost identical names (cf. fooqux and fooquux): more potential confusion and errors here. You have some function names, in your code, in uppercase (SPLIT_FOO and SPLIT_IF_FOO) without any obvious reason for doing so; your description has another, SPLIT, which doesn't appear in your code. If you can't provide a meaningful name then you probably haven't described, to a sufficient degree, your problem, your solution, or both.

    You've bunched up much of your code, reducing readability; you've also spread out other parts of your code (for no useful reason, as far as I can tell), in a non-standard fashion, further reducing readability. There is no direct correlation between code size (or perceived cleverness) and efficiency: use Benchmark where necessary.

    You also wrote, "hopefully will do everything right". Don't hope, test. See Test::More.

    On the whole, I'd say that what you've presented displays poor style; is unprofessional; fails in terms of readability; is highly error-prone; and is not easily maintainable. I'd scrap it completely and start again from scratch: I suspect that would be less time-consuming than attempting to fix every problem in situ. I'd also recommend revisiting, and perhaps reworking, your original design.

    — Ken

      Thank you everyone for criticism and comments, exactly kind of feedback (maybe not its contents) I wanted. Didn't expect junk I write (documentation, too) to be so incomprehensible. Except "key named 'value' causing confusion" (too much, no?), looks like I deserved it.

      haukex, thank you for intention to suggest better algorithm or data structure -- I deliberately described it (failed to communicate though) as something abstract, it's really not worth anyone's effort.

      P.S. As to benchmarking, you all are right again: loop with simple push to new array, of either element or its expansion, is faster than all of the above

Re: To splice or not to splice? What implementation is more readable, efficient, etc.?
by LanX (Saint) on Feb 18, 2017 at 22:52 UTC
    I agree with my fellow monks that this is not readable.

    I see two issues of over optimization

    • multiple lines of list generating code wthin [...]
    • multiple lines into map {...} block.

    Both are allowed if done with caution... but now it's almost impossible to indent these levels of nesting properly.

    I'd suggest using speaking variables for intermediate steps and a real function for the map-block.

    The performance penalty can't justify this.

    Edit:
    There are style-guides forbidding more than 2 levels of nesting in one sub, and you have already more in a single statement...

    Cheers Rolf
    (addicted to the Perl Programming Language and ☆☆☆☆ :)
    Je suis Charlie!

Re: To splice or not to splice? What implementation is more readable, efficient, etc.?
by haukex (Archbishop) on Feb 19, 2017 at 10:48 UTC

    Hi vr,

    Perl's "no-no" -- the C-style loop

    I have to strongly disagree that the C-style loop is a "no-no". The reason the foreach loop is so often recommended over the C-style loop is that many people, especially those coming from other languages, use the C-style for when Perl's foreach would have been more appropriate. But that doesn't mean there's anything wrong with the C-style for. In fact, of all of the code samples you showed, that sub is the only one I understood on the first reading!

    As for the other example subs, remember that attempting to be "clever" and golf code is usually a direct hindrance to later maintenance. I know it doesn't feel very perlish to be a little more verbose and, for example, use a for instead of map or to use if/else instead of ? :, but a long map block or ? : structure are less readable and therefore harder to understand and maintain. That doesn't mean they should always be avoided, but require developing a feeling for what will be more readable later.

    I know you said you're asking about style, but perhaps you could give some background on the problem anyway. If this data structure is something you're getting from somewhere else and you can't change it (e.g. JSON), then I understand that jumping through some hoops to walk it may be necessary. In that case, maybe you could show some sample input with the expected output.

    On the other hand, if this data structure is something you're building yourself, then perhaps you could consider an OO approach as one option. Hiding this data in objects, perhaps even ones that can serialize and deserialize themselves to your data structure, may help greatly in making the code cleaner.

    Hope this helps,
    -- Hauke D

Re: To splice or not to splice? What implementation is more readable, efficient, etc.?
by AnomalousMonk (Archbishop) on Feb 18, 2017 at 18:20 UTC

    Forget about being readable in a year, I can't figure out right now what all this is supposed to be doing. Serious TL;DR. If you plan on ever handing something like this over to a maintainer, be very sure that he or she doesn't know where you live, or where you keep your food if you bring your lunch to work.


    Give a man a fish:  <%-{-{-{-<

      Hi,

      Are you serious?

Re: To splice or not to splice? What implementation is more readable, efficient, etc.?
by Marshall (Canon) on Feb 19, 2017 at 22:17 UTC
    Hi vr,

    You say: Next sub prevents these shortcomings, but uses Perl's "no-no" -- the C-style loop.

    Your "sort-of-C-looking-code" here, is close to a no-no in C:
        for ( my $i = 0; $i < @$aref; $i ++ )

    In C, the real "work horse" is the while loop and pointers, incrementing pointers and pointers to arrays of pointers to pointers and dereferencing these pointers to test for NULL or other values to stop the loop.

    The array syntax x= i[3] is used as a "last resort" in good C code. There are definitely reasons use the index syntax, but this is not preferred. One issue is that accessing the memory address of i[3] is often caculated by the complier as 4 * size_of_an_i_element + base address of array i whereas incrementing a memory pointer through an array is just: pointer + size_of_an_i_element. Less math == faster.

    Back to Perl...
    The number one "bug" in sofware is the "off-by-one" error. The Perl code of foreach my $ref (@$aref){} would save you from that.
    An issue in the C-like loop: for ( my $i = 0; $i < @$aref; $i ++ ) comes with the $i < @$aref test and how that relates to the initial value and the use of i as an array index.

    A much bigger issue with your code appears to be the equating of "fewer source code lines" with "better" or "higher performance". That is definitely NOT true. In many cases shorter Perl can run much slower than more verbose implementations. I think you misused the Perl "map" function. "map" is a cute, short hand way to write a foreach loop. Normally map is used for one line "transformation" operations. For more than that, I use a foreach loop. There is no difference in the performance, but a big difference in the clarity of the code.

    At the end of the day, I agree the other Monks who have posted as to the confusing nature of OP code. I disagree with your idea of how to implement this in "C like Perl". I don't understand things well enough to suggest an alternative implementation.

Re: To splice or not to splice? What implementation is more readable, efficient, etc.?
by stevieb (Canon) on Feb 18, 2017 at 19:10 UTC

    I'm in agreement with my fellow Monk AnomalousMonk here. It's a lot to absorb, so I'd express it with more lines of code so things are more clear, instead of trying too hard to make it compact.

Re: To splice or not to splice? What implementation is more readable, efficient, etc.?
by Anonymous Monk on Feb 18, 2017 at 21:21 UTC

    Hi

    The way to know which is faster is to Benchmark

    The map versions are all identical :) from a "style" point of view

    The way to decide between SPLIT_IF_FOO and SPLIT_FOO is not based on chracter count in a map, but which is more useful/generic, which verbs/actions the problem ...

    map/splice/foreach, they're all idiotmatic

    The only problem with your c-style for loop is modifying $i in two different places

    If the only goal of this code is to expand_foo $items, there should be no maps or splices anywhere, just simple recursion with a foreach

    * no not even map in void context thats poor style