Speeding up perl script

samuelalfred has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Speeding up perl script by jettero (Monsignor) on Jan 28, 2009 at 08:36 UTC
Sounds like a great place to use a hash instead of an array. It would be better if you had some code or some pseudo-code so I could be sure I understand the task. -Paul	[reply]
Re^2: Speeding up perl script by samuelalfred (Sexton) on Jan 28, 2009 at 09:06 UTC
This is basically the code I'm talking about. Looping through a number of files, reading them (this routine is quick), and looping through all parameter names in each file, checking if they already exists in the global @names array and take action depending on the result. It is this check (find_element) that I want to speed up. Any ideas? If hash is a good alternative, could you please explain the difference compared to an array? New at perl so I'm not so familiar with all expressions... Thank you! foreach $filename (@input_names) #Go through input files { ($names_ref,$data_ref) = &read_file($filename); @tmp_names = @$names_ref; @tmp_data = @$data_ref; foreach $name (@tmp_names) #Go through lines of current input +file { $index = &find_element($name, @names); #Check if variable +is already present in name array (time consuming!) if ($index==-1) #Name not present, put both name and data +in array { push(@names,$name); push(@data,$tmp_data[$tmp_index]); } else #Name present, only replace data for the name { $data[$index] = $tmp_data[$tmp_index]; } $tmp_index++; } } [download]	[reply] [d/l]
Re^3: Speeding up perl script by Corion (Patriarch) on Jan 28, 2009 at 09:13 UTC
The main difference between a hash and an array is that an array is indexed by number while a hash is indexed by name. So where a lookup in an array is fast if you know the position of the value, a lookup in a hash is fast if you know the "name" of the value, that is, the (string) key the value is associated with.	[reply]
Re^3: Speeding up perl script by jettero (Monsignor) on Jan 28, 2009 at 10:10 UTC
Yeah, you probably want something more like this, or something similar to it. `my %hash; while(my $line = <>) { my ($name, $value) = split m/something goes here/, $line; die "error: value already defined!" if exists $hash{$name}; $hash{$name} = $value; }` [download] -Paul	[reply] [d/l]
Re: Speeding up perl script by Bloodnok (Vicar) on Jan 28, 2009 at 08:53 UTC
You don't give any indication of the format of the individual files - IMO, other than jetteros earlier suggestion (of using a hash), the easiest solution, subject to you having any say and a suitable format, would be to `do` each file - thereby satisfying both of the principle requirements since... All variables in the last file would be updated as a result of `do`ing the last file Assuming the requirement for an index of the record is to access its' value, the value [of the variable] would be accessed by accessing the variable itself A user level that continues to overstate my experience :-))	[reply] [d/l] [select]
Re^2: Speeding up perl script by samuelalfred (Sexton) on Jan 28, 2009 at 09:13 UTC
Thank you for your answer. And sorry for the lack of details in my description. The files I am reading are txt files with each row containing a parameter name and its value (separated by spaces). However, I don't have any difficulties reading the files, creating an array of names and an array of the corresponding values. What does "do" mean in this context? As I'm sure you understand, I'm quite a beginner at Perl :)	[reply]
Re^3: Speeding up perl script by Bloodnok (Vicar) on Jan 28, 2009 at 09:24 UTC
For a description of the `do` operator, see perlop. In the light of your reply, I'd probably do something along the lines of the following (in a script)... `use warnings; use strict; my %vars; while (<>) { local @_ = split; $vars{$_[0]} = $_[1]; }` [download] which takes each line of each file given on the command line and updates the `%vars` hash, keyed on the first field & values of the 2nd field - as per postings here and elsewhere on this thread. A user level that continues to overstate my experience :-))	[reply] [d/l] [select]
Re^3: Speeding up perl script by cdarke (Prior) on Jan 28, 2009 at 10:39 UTC
After creating the hash, using the parameter values as the key (see above), you can search for the parameter with: `if (exists $vars{'parameter'}) { # It is there! } else { # It is not there }` [download] This avoids an iterative search through the array.	[reply] [d/l]