Re^4: Question about speeding a regexp count

Reading chunks of data is important if you are low on memory. And I'm 100% sure the OP has to read his data from a file. So reading data is something that is essential for the algorithm.

If you are so low on memory that you can't read 6 megs of data into RAM, you might want to invest a couple dollars in an upgrade. ;-)

And, though you are right that he probably has to read the data in from the file, your benchmark assumes that everyone else would pick a poor way of reading that data while you pick a relatively good way. For instance, instead of the lame my $allbuffered= <$testfile>; you chose for us, I would likely choose to use sysread with 4k chunks which, on my machine, would beat read with 1k chunks with a performance gain of close to %100 on average.

You can't make assumptions like that about somebody else's code and then claim yours is better. All you show is that your assumptions are poor.

Here I do agree. Of course can it be done. But you didn't ;-)

I remain unconvinced that it is even necessary. As I already pointed out, the size of the data is only 6 megs. This isn't the 80's and that's nothing. Even if he had 60 megs of data, reading it all in might likely be a fine strategy. Now, if he had 600 megs, I'd look at reading chunks... at least on my desktop.

If you like though, do a comparison after swapping out the buffered IO you assumed for us with sysread like I suggested. Leave yours as is and report the difference.

Of course: All of you could rewrite and use my correction algorithm. This would make some of your algorithms significantly faster, I think.

I'm not so sure of that. I don't see anything intrinsic about it which would suggest an increase. It is an interesting twist though. I'd rather see corrected benchmarks first (i.e. equivalent reads or no reads at all) before I started making that comparison.

-sauoq
"My two cents aren't worth a dime.";

Comment on Re^4: Question about speeding a regexp count Select or Download Code