thpfft has asked for the wisdom of the Perl Monks concerning the following question:

I quite often seem to do this:

$text = do_something_complicated_to($text);

and assume that it's economical that way. But a couple of recent posts have made me suspect that i'm actually creating a whole new variable there. Is this true? if so, there must be a better way.

Is this the way to do it? It feels fragile (and scary):

do_something_complicated_to(\$me);

Or maybe there's something one can do by returning a ref that would have a similar effect but where i can see it?

I guess what i really want is this:

$text =~ something_complicated();

but since i can't have that, what's the Right Way, please?

Replies are listed 'Best First'.
Re: style q: duplication? of variables
by japhy (Canon) on Aug 14, 2001 at 16:51 UTC
    It's not "wrong" to do things to arguments. Many of Perl's functions do this already... chomp, chop, substr, splice, push, vec, ... You could do the same:
    sub add_ten { $_[0] += 10; } $x = 20; add_ten($x); print $x; # 30

    _____________________________________________________
    Jeff[japhy]Pinyan: Perl, regex, and perl hacker.
    s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;

      Ouch!
      This kinda shocks me! I definitely don't like it when in a language I can't see at the function call whether an argument might be changed. And I always felt secure in perl because I'd just expected and argument to be changed when I pass it by reference. And then ... yes I never actually thought deeply about this. If I had it should have occured to me at the example of chomp and friends..

      When I read a book introducing the Ruby language that was one of the major disadvantages I found. I think, I'll have to drop that point.

      Uh, oh, whadda day...

      Thanks for destroying my illusions, anyway ;)

      Regards... Stefan

        Come on, cheer up, it's not as bad as it seems :-)

        Most functions start with:

        sub mysub { my($first_param, $second_param, $and_so_on) = @_; # rest of the function }
        and this actually makes copies of received parameters.

        There's no need to specially highlight this way of passing parameters (as opposed to other languages) because... it's the only way that Perl supports, as perldoc perlsub will confirm you.

        I think that, from a Practical Extraction and Report Language perspective, this made a lot of sense when he invented the language. From a KISS (Keep It Simple, Stupid) point of view, if you decide to implement only one way of passing parameters, it's better to implement the most useful one...

        This does not mean that you should expect your variables to change all the time, or to contain unreliable data. If you check perldoc perlfunc, you'll see that only functions that need to change their parameters' values do so. As usual, Perl gives you enough flexibility to exercise common sense without draconian restrictions (or to hang yourself with your own rope). You can find similar ideas (it's up to the programmer to be clear and polite, not up to the language to force you to behave properly) in the Object Oriented design of Perl.

        References should be considered as a way to flexibly handle data structures rather that a way to show that you're going to change the value held in the variable. One of the most compelling reason to use references is that, if you pass two lists to a function, they will be collapsed and flattened into @_. I think this is JAUPF (Just Another Useful Perl Feature) when you deal with a variable number of arguments, even though it was kind of annoying under some circumnstances in Perl 4. Well, today, if you want to keep the array separated, just pass a reference to them :-)

        -- TMTOWTDI

        I definitely don't like it when in a language I can't see at the function call whether an argument might be changed.

        Well, you picked a trivial example (scalars, specifically strings). In much of the Perl code you'll see, references are being passed around anyway, since plain old scalars have limited usefulness.

        What you may want to do is to adopt a naming convention to signify subroutines that take scalar arguments and modify them.

        You note that this was a disadvantage of Ruby; but note that Ruby has a convention of using a trailing '!' sign to denote methods that modify value types (especially) in place (i.e. chomp!() vs. chomp()). Maybe your Perl code could do something similar (i.e. truncate() vs. truncated(), perhaps).

Re: style q: duplication? of variables
by htoug (Deacon) on Aug 14, 2001 at 16:51 UTC
    I find that I (mostly) use the $new = do_something($old) way when I feel that $old and $new often will be different variables.

    I tend to use something like

    sub do_something_to(\$) { ... } ... do_something_to($me);

    Note the absense of \ in the call of the function. That is taken care of by the prototype. (Se perlsub for more info on prototypes).

    There is no Right Way, there is just "My Way" and "Your Way"...

Re: style q: duplication? of variables
by AidanLee (Chaplain) on Aug 14, 2001 at 16:55 UTC

    Your suspicions are true that whenever passing a non-ref'd variable into or back out of a function, it makes a copy of what you were working with, be it a scalar, an array or a hash.

    The most thorough way to squelch duplicates is to pass your variables in by reference. You can then modify the reference in place and you won't have to return anything. You can just manipulate the variable in-place.

    returning a reference from a subroutine is often handy if you're creating something large and complex from scratch inside the subroutine. Forexample, a long Array of Hashes.

Re: style q: duplication? of variables
by nakor (Novice) on Aug 14, 2001 at 17:36 UTC
    Since arguments are passed _by alias_, doing things to $_[0] actually changes the first argument (and so on). E.g.:
    sub frobnicate { $_[0] =~ s/foo/bar/; $_[0] =~ s/\d/\&/g; } my $var = "foo 12345"; frobnicate($var); # $var is now "bar &&&&&"
    Of course, there is no "One True Way," but this is clean and works.

      Ouch. chastised in the chatterbox, i've digested all this and read a bit more thoroughly than before. I now understand that the creation of a new variable happens at this stage:

      my ($text) = @_;

      Which is embarrassingly obvious once realised. And that when I

      return $text;

      I am returning the new - local - entity and not the original one, so back in main::, the $text variable is now pointing to the value that was created in and returned from the sub. (Dominus' very lovely explanation of scope and duration was helpful here. That one really ought to be in best nodes.)

      And the reason things are different with references is that a duplicate of a reference points to the same piece of data as the original. There is still duplication, but only of pointers, not data.

      Thought i'd better put things straight before i get told off any more. Thanks for all the answers.

Re: style q: duplication? of variables
by dws (Chancellor) on Aug 14, 2001 at 22:41 UTC
    "Premature binding" is one reason to avoid writing   do_something_complicated_to(\$text); This style of interface prematurely ties the hands of clients, forcing them to write
    my $processed_text = $text; do_something_complicated_to(\$processed_text);
    if they want to hang on to both the unprocessed string and the processed string. This clutters up the client, leaving little windows where things aren't as they are named. You can close the window somewhat by writing   do_something_complicated_to(\my $processed_text = $text); But that's still burdening the client (and burdening whoever has to maintain the code).

    It's cleaner and more readable to write   my $processed_text = do_something_complicated_to($text);

    In practice, this becomes less of an issue when you're writing a routine to side-effect (process) a data structure, and don't want to (or need to) take on the overhead of returning a modified copy of the entire structure.