in reply to Re: character-by-character in a huge file
in thread character-by-character in a huge file
a) see what I'm doing wrong,
The first thing you are doing wrong is that you are comparing apples and oranges. Take your 2nd benchmark.
cmpthese( 10, { slurp_substr => sub { open (FH, "<$filename"); my $i = 0; while ( <FH>) { while ($ch = substr($_,$i++,1)){ } } close FH; }, slurp_simpleregex => sub { my $len=0; open (FH, "<$filename"); while ( <FH>){ $_ =~ /(.)$/; } close FH; }, slurp_length => sub { my $len=0; open (FH, "<$filename"); while ( <FH>){ $len += length($_); } close FH; }, });
This reads the whole file, record-by-record, and then appears to set the (global) variable $ch to each character in each record.
But, your setting the variable $i outside the main loop; incrementing it for each char in the record; but never resetting it.
Hence, for the second and subsequent records, $i will have whatever value it had at the end of the previous record. If the first record is longer than the rest, it will do nothing for any record other than the first.
Both slurp_substr() and raw_slurp_substr() routines in the 1st benchmark are similarly afflicted.
Your regex says put the last character of each record into $1. Your simply ignoring every character except the last in each record.
This is the most mysterious of all. You read each record in the file and accumulate the lengths of those records into the variable $len.
You never access any of the characters?
The first rule of benchmarking is that you need to ensure that you are comparing like with like.
The second rule is that you need to make sure that what you are benchmarking is useful and relevant to your final program.
In these tests, you are doing neither.
That's much worse than what I've seen (but haven't tested here) in C.
If your point is that Perl is slower than C. Your right.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Re: Re: character-by-character in a huge file
by mushnik (Acolyte) on Apr 13, 2004 at 15:49 UTC | |
by BrowserUk (Patriarch) on Apr 13, 2004 at 16:29 UTC | |
by mushnik (Acolyte) on Apr 13, 2004 at 17:14 UTC |