perlcat has asked for the wisdom of the Perl Monks concerning the following question:
I am writing to you to seek some advice on how to optimize a script I wrote. I will first describe the situation, and then present the possible solutions.
The size of the input file is about 50mb; each line contains a string which is built like this:
abcd\tabcdwhere \t is a tab seperating the two blocks.
As for now, I read the whole file line by line and search $_ for a given string. If found, I push the second block into an array. To do so, I match every line against the regular expression ^(.*)\t(.*)$.
I am experiencing serious performance issues. It takes me between 5 and 10 seconds to have the final array.
I have read a bit and found out that there are plenty of ways to program this script differently. Which would be the best solution?
a) read the whole file into an array and then parse the array?
b) read the file line by line, as I do now?
c) a friend also told me the reg exp was inefficient. He said I would gain some speed by replacing the reg exp with the index and substring functions. True?
d) create an index of the file? I'm not really sure how to tackle this.
So dear Perlmonks, thanks in advance for your advice.
Larry
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: performance issues
by zentara (Cardinal) on Jan 27, 2009 at 18:27 UTC | |
by gwadej (Chaplain) on Jan 27, 2009 at 19:00 UTC | |
by perlcat (Novice) on Jan 27, 2009 at 19:05 UTC | |
|
Re: performance issues
by kyle (Abbot) on Jan 27, 2009 at 18:29 UTC | |
by perlcat (Novice) on Jan 27, 2009 at 19:01 UTC | |
|
Re: performance issues
by gone2015 (Deacon) on Jan 27, 2009 at 18:19 UTC | |
by perlcat (Novice) on Jan 27, 2009 at 19:09 UTC | |
by gone2015 (Deacon) on Jan 27, 2009 at 23:23 UTC | |
by perlcat (Novice) on Jan 28, 2009 at 07:54 UTC | |
by JadeNB (Chaplain) on Jan 27, 2009 at 22:19 UTC | |
|
Re: performance issues
by matija (Priest) on Jan 27, 2009 at 18:35 UTC | |
by perlcat (Novice) on Jan 27, 2009 at 19:02 UTC | |
|
Re: performance issues
by moritz (Cardinal) on Jan 27, 2009 at 18:52 UTC | |
by perlcat (Novice) on Jan 27, 2009 at 19:04 UTC | |
by MidLifeXis (Monsignor) on Jan 27, 2009 at 19:50 UTC | |
|
Re: performance issues
by jh- (Scribe) on Jan 27, 2009 at 18:21 UTC | |
|
Re: performance issues
by MidLifeXis (Monsignor) on Jan 27, 2009 at 18:22 UTC |