Roy Johnson has asked for the wisdom of the Perl Monks concerning the following question:

If I had answers, this might make a good meditation. Since I have mostly questions, it's a SoPW.

It has become fairly popular to chain calls to map, kind of like piping commands together. When I see those constructs, I wonder whether Perl is doing anything clever to avoid unnecessary copying, or if each call to map really does construct a new list, which the next map goes through to construct yet another list. How much memory overhead are we talking about?

After some pondering, it occurred to me that a construct like

map { BLOCK2; (return_list2) } map { BLOCK1; (return_list1) } @input;
is functionally equivalent to
map { BLOCK1; local $_ = $_; map { BLOCK2; (return_list2) } (return_li +st1) } @input
but the BLOCK2 map is called for each return_list1, rather than once on a list that is @input times as big. The actual BLOCK2 code gets executed the same number of times. So the questions are:
  1. Does perl do anything clever regarding memory management in chained maps?
  2. Is my equivalence correct? Dangerous? Beneficial?
  3. If beneficial, could perl optimize chained maps into nested maps internally?

We're not really tightening our belts, it just feels that way because we're getting fatter.

Replies are listed 'Best First'.
Re: map: chaining vs. nesting
by diotalevi (Canon) on Jun 18, 2004 at 14:10 UTC

    1. Nothing specific outside of a general recognition that some copies can be avoided. A map's result is pushed onto the stack (assuming list context) for the subsequent op to do something with it. It is a copy but it is marked as something that can be recycled. Stuff that recognizes that (like map itself) can avoid making extra copies. I think  = map( EXPR, map( EXPR, LIST ) ) copies the values from LIST once and then as possible, just re-uses the copies for the next map which gives it to the aassign which also notes that the input is recycleable and steals it instead of making another copy.

    2. Not terribly useful. I used chained maps where the output from one is not the same size as the input. The subsequent maps deal with these entirely different lists. I suppose it could be moved into the map but that just makes the code harder to read.

    3. See #1. Something of a fashion, in a more general sense already happens.

      Regarding #2: The fact that map output size differs from input size doesn't mean there's no benefit to nesting vs. chaining. The benefit I expected was in memory overhead (from not having so many copies resident at once) more than in reduction of operations (from copying multiple times). I agree with your point about nesting being harder to read; that's why I suggested it as an optimization.

      Putting the idea in better terms, I'm suggesting that map, in an ideal world where perl development was effortless, should behave like (1..100000): when its output is assigned, the range generates the whole list; but when its output is used iteratively (at least, in a foreach loop), it simply generates one output value at a time, as needed. map, grep, and ranges could all behave that way when feeding map, grep, or foreach.


      We're not really tightening our belts, it just feels that way because we're getting fatter.
        Lazy generation is a feature you can specify in perl6. Not yet so for perl5.
Re: map: chaining vs. nesting
by hardburn (Abbot) on Jun 18, 2004 at 14:30 UTC

    When I see those constructs, I wonder whether Perl is doing anything clever to avoid unnecessary copying

    No, a list is generated at the end of each map. So the ST, for example:

    my @sorted_data = map { $_->[1] } sort { $a->[0] <=> $b->[0] } map {[ do_something($_), $_ ]} @unsorted_data;

    Takes about 3 times as much memory as @unsorted_data orginally used. It also happens to be efficient because of an optimized sort step. CPU/memory tradeoffs generally favor the CPU--in fact, Perl itself is one gigantic CPU/memory tradeoff.

    Further meditiation: LISP and other old interpreted languages were not well received when they were still young. Its detractors claimed it took too much computing power to handle the virtual machine it required. Today, a lot of very popular languages run under VMs (Perl, Java, Python, C#/.NET, etc.), and often run without significantly cutting into the resources available on a modern system. So the old reasoning should be considered moot.

    However, there is a second reason why LISP and other functional langauges are inefficient--that being the memory requirements to handle the massively-chained operations that functional programming styles encourage. With significant ammounts of data, it could easily overwhelm the RAM of even modern systems.

    ----
    send money to your kernel via the boot loader.. This and more wisdom available from Markov Hardburn.

      Not quite. There's a check in map to see whether the input is a TEMP or not and if so, it is re-used for the output. The list assignment operator also notices that the input is stealable and steals the scalars away to be the kept-for-real copies in @sorted_data. So one copy is made.

      I'd have written this one as a GRT or tye's fast, flexible, stable sort, though instead.

      They are only inefficient to a point. If, in Lisp|Perl|C|whatever, you write a recursive function that runs a few thousand times, then you're going to suck up some RAM and the stack-push/pop CPU time. LISP, Scheme, ML, and their compadres are pretty well designed to minimize this. But, if you write a tail-recursive function, you're fine; overhead is exactly the same as for iteration.

      So it's all about how you implement your algorithms. Any language can be inefficient - m/(a|b+)*/ being a simple example I can remember being horrible in Perl. Besides, though most style guides for functional languages don't encourage it, they almost all have iteration constructs, even if they're not popular. I know LISP does; I think Scheme and ML do.

        Indeed. I spent a year working rather intensively on both ML and Haskell (which are structurally similar but have some crucial differences) and one of the first things I learned is that while it's tempting to implement a factorial like this:

        fact 0 = 1 fact 1 = 1 fact n | n < 0 = error "Negative factorial!" | otherwise = n * fact (n - 1)

        your life will be vastly improved if you implement it like this:

        fact 0 = 1 fact 1 = 1 fact n | n < 0 = error "Negative factorial!" | otherwise = g 1 n where g x 1 = x g x n' = g (x * n') (n' - 1)

        because the latter uses tail recursion and the former doesn't.

        If, in Lisp|Perl|C|whatever, you write a recursive function that runs a few thousand times, then you're going to suck up some RAM and the stack-push/pop CPU time. LISP, Scheme, ML, and their compadres are pretty well designed to minimize this. But, if you write a tail-recursive function, you're fine; overhead is exactly the same as for iteration.

        Tail recursion can be converted directly into iteration; this is part of the Scheme standard, I believe, and most implementations of other functional languages support it. If you're unlucky enough to have an implementation that doesn't, though, tail recursion will still suck up stack space for function calls.

        --
        F o x t r o t U n i f o r m
        Found a typo in this node? /msg me
        % man 3 strfry

Re: map: chaining vs. nesting
by ihb (Deacon) on Jun 19, 2004 at 20:37 UTC

    This node may or may not be of interest in regards to your overhead question.

    ihb