ultranerds has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I'm trying to work out what would be quicker... doing:
my $new_contents = my_test($old_content); sub my_test { my $long_content = $_[0]; # do some stuff here return $long_content; }
..or use a "global" var..ie:

$main_vars{content} = $old_content; my $new_contents = my_test(); $old_content = $main_vars{content}; sub my_test { # do some stuff here with $main_vars{content} }
Be aware, these are gonna be pretty large strings... i.e thousands, if not tens of thousands worth of lines (as they are "wiki" articles)

TIA

Andy

Replies are listed 'Best First'.
Re: Whats quicker - passing along as variable to sub, or global var?
by moritz (Cardinal) on Apr 08, 2011 at 13:03 UTC
    I'm trying to work out what would be quicker

    Then Benchmark it.

    Be aware, these are gonna be pretty large strings... i.e thousands, if not tens of thousands worth of lines (as they are "wiki" articles)

    Then you should avoid making copies. Note that passing strings to functions does not make a copy (@_ aliases), but my $long_content = $_[0]; does.

    A good way to avoid copying and still get the scoping benefits of argument passing is by using references explicitly. I guess you're familiar with perlreftut and perlref...

      Hi,

      Thanks - I was actually doing some benchmarks after I wrote this :) Here are the results:

      C:\Users\Andy\Documents>perl test.pl Length of string: 101000000 Time taken was 0 wallclock secs ( 0.17 usr 0.03 sys + 0.00 cusr + 0.00 csys = 0.20 CPU) seconds Time taken was 1 wallclock secs ( 1.13 usr 0.03 sys + 0.00 cusr + 0.00 csys = 1.16 CPU) seconds C:\Users\Andy\Documents>
      The 1st one uses:
      my $new_string = testing_string($string); sub testing_string { my $content = $_[0]; # do some stuff $content =~ s/a/b/sg; return $content; }
      ..where as the "slower" one, does:
      # do stuff here my %test_var; $test_var{$string} = $string; testing_string(); $string = $test_var{$string}; sub testing_string_2 { # do some stuff here $test_var{$string} =~ s/a/b/sg; }
      >> Then you should avoid making copies. Note that passing strings to functions does not make a copy (@_ aliases), but my $long_content = $_[0]; does.

      Not sure what you mean?

      TIA

      Andy
        Not sure what you mean?

        I meant exactly what I wrote: Passing a string to a function doesn't make a copy. Assigning it to a separate variable does. So

        sub f { print $_[0]; }

        avoids the copy, whereas

        sub f { my $x = $_[0]; # copy created here print $x; }

        creates one.

        Since it's quite ugly to use $_[NUMBER] all the time, I suggested references instead.

        my $new_string = testing_string($string); sub testing_string { my $content = $_[0]; #string copy, expensive # do some stuff $content =~ s/a/b/sg; return $content; #string copy, expensive }
        # do stuff here my %test_var; $test_var{$string} = $string; # double!! string copy and hash # sum calculation, expensive. And why # this senseless copying into a hash? a scalar # variable (i.e $string) is global too testing_string(); $string = $test_var{$string}; # string copy and hash sum calculati +on of $string sub testing_string_2 { # do some stuff here $test_var{$string} =~ s/a/b/sg; }

        What you could do is this:

        testing_string(\$string); #send a reference to $string instead of $ +string sub testing_string { my $contentref = $_[0]; # do some stuff $$contentref =~ s/a/b/sg; }

        But are you sure you have problems with the speed of your script? Did you profile your script to see where it is slow? You might google for "premature optimizations", this is the most common mistake programmers make. And as you can see, often they even make it more expensive because they don't know enough about the underlying mechanism.

        Simply copying strings is a very fast operation on modern CPUs because they can use special instructions (maybe even DMA transfers) for it. You might only save micro- or milliseconds if this operation is not done thousand or million times in your script

Re: Whats quicker - passing along as variable to sub, or global var?
by JavaFan (Canon) on Apr 08, 2011 at 13:13 UTC
    Be aware, these are gonna be pretty large strings... i.e thousands, if not tens of thousands worth of lines
    Then you want to avoid copying the content. Since your second snippets copies the content twice, that will be the slowest.
Re: Whats quicker - passing along as variable to sub, or global var?
by ikegami (Patriarch) on Apr 08, 2011 at 18:11 UTC

    In snippet 1:

    $long_content = ... # Long string copy 1. $new_contents = ... # Long string copy 2.

    In snippet 2:

    $main_vars{content} = ... # Long string copy 1. $old_content = ... # Long string copy 2.

    They are going to be similar since the parts you said are slow occur in both versions. Moving the slow parts around doesn't make things faster.

Re: Whats quicker - passing along as variable to sub, or global var?
by anonymized user 468275 (Curate) on Apr 08, 2011 at 13:49 UTC
    The normal approach, irrespective of scope, is to have a single copy of the string and pass scalar references around. For example:
    my $bigref = bigget; ... do something ... do_something_more( $bigref ); sub bigget { local $/ = undef(); # disable carriage control locally open my $fh, $ENV{BIG_STUFF} or die "$!: \$ENV{BIG_STUFF}\n"; my $slurp = <$fh>; close $fh; \$slurp; # return the reference to $slurp, which also keeps it ali +ve outside this scope!!! } sub do_something_more { my $bigref = shift; # copying the ref not the string $$bigref =~ /^\s+(\S+)/; # using a dereference my $first_word = $1; ... do more ... }

    One world, one people

Re: Whats quicker - passing along as variable to sub, or global var?
by Eliya (Vicar) on Apr 08, 2011 at 16:45 UTC

    In some cases it makes sense to (mis)use for to create named aliases, which might improve readability, in particular if you need the variable several times in the routine.  This means you can use a self-explanatory variable name (as opposed to $_[0]).  And in contrast to using normal references, you don't have to mess with \$var and $$var (de)referencing syntax.

    The following three variants are functionally equivalent in that they all avoid copying of the argument.

    # named alias: sub func { # usage: func($string) for my $string ($_[0]) { $string =~ s/foo/bar/g; } } # direct @_ alias: sub func { # usage: func($string) $_[0] =~ s/foo/bar/g; } # explicit reference: sub func { # usage: func(\$string) my $stringref = $_[0]; $$stringref =~ s/foo/bar/g; }

    And with large strings, all three variants are significantly faster than the "default" (copying) approach:

    sub func { # usage: $string = func($string) my $string = $_[0]; $string =~ s/foo/bar/g; return $string; }
Re: Whats quicker - passing along as variable to sub, or global var?
by jpl (Monk) on Apr 08, 2011 at 13:18 UTC

    Alternatively, reference my variables external to your code, acting a lot like global variables:

    my $oldcontents; # initialize $oldcontents my $newcontents; sub my_test { # read from $oldcontents # write to $newcontents } my_test();
    But benchmark, by all means.

      Ok, well both of these have the same affect:
      my $string = "foo bar testing"; my $string2 = "foo bar testing"; use Benchmark; use Test; #################### # Test 1 #################### my $start = new Benchmark; Test::testing1($string); print qq|String is now: $string \n|; my $end = new Benchmark; my $diff = timediff($end, $start); print "Time taken was ", timestr($diff, 'all'), " seconds \n\n"; #################### # Test 2 #################### # start timer my $start = new Benchmark; $string2 = Test::testing2($string2); print qq|String2 is now: $string2 \n|; my $end = new Benchmark; my $diff = timediff($end, $start); print "Time taken was ", timestr($diff, 'all'), " seconds \n\n";
      However, if I change the $string and $string2 to:

      my $string = "sf osfpsid hfosidhf soifh sofihs foishfosihf soifh soifhsofihsfoihs foishf sofhfo ihsf oishfsoifhsf \n" x 1000000;

      ...then there is quite a considerable difference. The code in Test.pm is as follows:
      package Test; use strict; sub testing1 { print "At testing1 \n"; $_[0] =~ s/foo/test/g; } sub testing2 { print "At testing2 \n"; my $string = $_[0]; $string =~ s/foo/test/g; return $string; } 1;
      Judging from the above then - I'm guessing just editing $_[0] directly is gonna be the best option for us, as it seems to be a lot quicker.

      Cheers

      Andy

        Devel::NYTProf, available on CPAN, is also your friend here. It does a wonderful job of showing where the time is going. If your subroutine is only a tiny fraction of the total time used, there's not much point in making it uglier to make it faster. If it is a major player, the profiler will point you to the parts that are worth tuning.

      Thanks - the problem with that, is we are passing back and forth around other .pm files (for example, View.pm, which actually grabs the contents... then Markups.pm, which process the wiki markups, and then returns back to View.pm, etc)

        You can avoiding copying to/from your my_test() routine. If you must copy to initialize the old content, and to pass back the new content, such is life.

Re: Whats quicker - passing along as variable to sub, or global var?
by locked_user sundialsvc4 (Abbot) on Apr 08, 2011 at 17:07 UTC

    You are basically “benchmarking” the speed of the virtual memory subsystem.   And, you are using memory like a filesystem.

    Don’t “diddle” code to make it faster -- find a better algorithm.
    – The Elements of Programming Style

Re: Whats quicker - passing along as variable to sub, or global var?
by osbosb (Monk) on Apr 08, 2011 at 18:10 UTC
    You could always run both versions through "time" for a millisecond level time stamp.. Don't know if it helps you but it's worth a run.
Re: Whats quicker - passing along as variable to sub, or global var?
by locked_user sundialsvc4 (Abbot) on Apr 11, 2011 at 17:59 UTC

    My recommendation would be that you should use some form of a database/external-file structure here.   In situations like these, I happen to find SQLite to be very hard to beat, even when I am creating a file on-the-spot as working storage and demolishing the whole thing at the end.   (P.S.: when using SQLite, transactions are very crucial to performance.)

    You grab each article into memory, work with it, put it back, and then repeat the process, effectively re-using the same memory pages each time.   Disk I/O will still be going on ... since all memory is virtual, you cannot avoid this ... but the nature of that I/O will be in line with what the virtual memory manager and the filesystem manager, respectively, expect and are prepared to deal with.   The pages will undoubtedly sit in a stable working-set, while the I/O that takes place will be file I/O against an efficient indexed file.   The OS, whatever it is, knows very well how to serve the needs of an application that behaves in this way.

    Even though the technical capacity of hardware these days is vastly larger than it once was, it is still the case that, when you exceed those limits or when you go against the software designers’ core assumptions in their design of a subsystem, “it all falls down” rather quickly and badly.

    Given that you’re asking the OQ of “what’s quicker,” I know that the app is in-pain already.   The answer should be, “it doesn’t matter, and if it does seem to matter, that is not the true root cause of the problem.”   The virtual memory subsystem is not happy, so to speak.   Fact is, it is crying in pain (and so are any users, if any, who have to share the same system with you), because this app is “going against the grain.”

    IMHO.™   HTH.™   TMTOWTDI.™

      How will "some form of database" allow the OP to pass very large strings from one place in his code to another efficiently?

      And of course the answer is: It won't. Just more fatuaous garbage.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.