samuelalfred has asked for the wisdom of the Perl Monks concerning the following question:

Hello,
I'm constructing a script that reads files containing parameter names and their corresponding values. It can read multiple files and if there are values of the same parameter in more than one of the files the last value of the parameter should be used.
The problem is that to be able to achieve this I have to for every parameter in each file check if that parameter exists in the list (array) of parameters that already have been loaded. To do this I have written a small subroutine that loops through the list (array) of loaded parameters looking for the current parameter. This seems to be a very time consuming task since it takes quite some time to load a file containing many (>500) parameters. Is there a better (faster) way of doing this check? Please note if the parameter exists in the array I also need to know the index of it.

Replies are listed 'Best First'.
Re: Speeding up perl script
by jettero (Monsignor) on Jan 28, 2009 at 08:36 UTC
    Sounds like a great place to use a hash instead of an array. It would be better if you had some code or some pseudo-code so I could be sure I understand the task.

    -Paul

      This is basically the code I'm talking about. Looping through a number of files, reading them (this routine is quick), and looping through all parameter names in each file, checking if they already exists in the global @names array and take action depending on the result. It is this check (find_element) that I want to speed up. Any ideas? If hash is a good alternative, could you please explain the difference compared to an array? New at perl so I'm not so familiar with all expressions... Thank you!
      foreach $filename (@input_names) #Go through input files { ($names_ref,$data_ref) = &read_file($filename); @tmp_names = @$names_ref; @tmp_data = @$data_ref; foreach $name (@tmp_names) #Go through lines of current input +file { $index = &find_element($name, @names); #Check if variable +is already present in name array (time consuming!) if ($index==-1) #Name not present, put both name and data +in array { push(@names,$name); push(@data,$tmp_data[$tmp_index]); } else #Name present, only replace data for the name { $data[$index] = $tmp_data[$tmp_index]; } $tmp_index++; } }

        The main difference between a hash and an array is that an array is indexed by number while a hash is indexed by name. So where a lookup in an array is fast if you know the position of the value, a lookup in a hash is fast if you know the "name" of the value, that is, the (string) key the value is associated with.

        Yeah, you probably want something more like this, or something similar to it.
        my %hash; while(my $line = <>) { my ($name, $value) = split m/something goes here/, $line; die "error: value already defined!" if exists $hash{$name}; $hash{$name} = $value; }

        -Paul

Re: Speeding up perl script
by Bloodnok (Vicar) on Jan 28, 2009 at 08:53 UTC
    You don't give any indication of the format of the individual files - IMO, other than jetteros earlier suggestion (of using a hash), the easiest solution, subject to you having any say and a suitable format, would be to do each file - thereby satisfying both of the principle requirements since...
    • All variables in the last file would be updated as a result of doing the last file
    • Assuming the requirement for an index of the record is to access its' value, the value [of the variable] would be accessed by accessing the variable itself

    A user level that continues to overstate my experience :-))

      Thank you for your answer. And sorry for the lack of details in my description. The files I am reading are txt files with each row containing a parameter name and its value (separated by spaces). However, I don't have any difficulties reading the files, creating an array of names and an array of the corresponding values.

      What does "do" mean in this context? As I'm sure you understand, I'm quite a beginner at Perl :)
        For a description of the do operator, see perlop.

        In the light of your reply, I'd probably do something along the lines of the following (in a script)...

        use warnings; use strict; my %vars; while (<>) { local @_ = split; $vars{$_[0]} = $_[1]; }
        which takes each line of each file given on the command line and updates the %vars hash, keyed on the first field & values of the 2nd field - as per postings here and elsewhere on this thread.

        A user level that continues to overstate my experience :-))
        After creating the hash, using the parameter values as the key (see above), you can search for the parameter with:
        if (exists $vars{'parameter'}) { # It is there! } else { # It is not there }
        This avoids an iterative search through the array.