Keef has asked for the wisdom of the Perl Monks concerning the following question:

Is there any evil in using a var/ary/hash over and over again? Example: You slurp several text files in, one at a time. For each file, you use DATA as a file handle, and you put the contents of the file into @lines. It will work, but are there any side effects of using the @lines array for many different data sets in the same script? What about $variables and %hashes? Or, is this the perferred method for memory conservation?
  • Comment on Using a Variable Mutipule times(with different data)

Replies are listed 'Best First'.
Re: Using a Variable Mutipule times(with different data)
by chromatic (Archbishop) on Dec 30, 2000 at 03:14 UTC
    Yes and no.

    Yes, if you read in many different files, perform the same types of operations on them, but don't create a function to handle those manipulations. Or at least a loop. People who do this tend not to be diligent about error handling and tie themselves up in global variable failure knots.

    # don't do this my @lines; open(DATA, ">file1"); while (<DATA>) { push @lines, $_; } process(@lines); open(DATA, ">file2"); @lines = <DATA> while (<DATA>) { push @lines, $_; } process(@lines);
    This isn't a problem for filehandles, but what if the second open fails? What if process() doesn't remove everything from @lines? How long will you spend tracking down an error set by file #2 in a global variable that only shows up when you process file #10?

    No, if you have a couple of functions that do different things but use lexical variables within the functions.

    sub read_file { my $filename = shift; open (DATA, ">$filename") or return; # could die on error my @lines = <DATA>; return process(@lines); }
    Besides that, if you keep using the same spot in memory over and over, you might cause that particular byte to fail prematurely. Better to spread things over the whole RAM stick.
Re: Using a Variable Mutipule times(with different data)
by fundflow (Chaplain) on Dec 30, 2000 at 02:23 UTC
    This is mostly a methodologic question.

    The use of different variables for different things is preferred as when someone reads/debugs your code it is clear that the different instances relate to different data.
    All compilers nowadays (including perl, according to the excellent explenation here) define the scope of the variable, so the "dangling" variables get cleared away as soon as they are done.

    An exception for this is objects that require expensive setup (like ftp connection for example). In these cases it makes more sense, from optimization point of view, to reuse these resources.

    Sorry for not giving actual timing data, maybe someone will benchmark it.

      Here is a benchmark:
      #!/usr/bin/perl -w use strict; use Benchmark; timethese(1000, { 'global' => '&global', 'local' => '&local' }); sub global { my %hash; for(1..100) { for(1..10) { $hash{rand(100)} = 123; }; } %hash=(); } sub local { for(1..100) { my %hash; for(1..10) { $hash{rand(100)} = 123; }; } }
      And it gives:
      Benchmark: timing 1000 iterations of global, local... global: 14 wallclock secs (13.47 usr + 0.01 sys = 13.48 CPU) local: 14 wallclock secs (13.33 usr + 0.02 sys = 13.35 CPU)
      so it seems like there is not much difference.