in reply to Re^6: Optimise file line by line parsing, substitute SPLIT
in thread Optimise file line by line parsing, substitute SPLIT
But it seems that you mean that OP benchmarks incorrect, because he benchmarks nothing vs split.
No. As a measure of the time taken to do the splits, his benchmark is fine.
What is wrong is his apparent expectation that locating 26 million tab characters; copying 28 million strings and making 28 million assignments would (or should) take less than 8 seconds it does. 80 million fairly complex operations in 8 seconds is 1 every 10th of a microsecond. And is pretty damn good.
The only ways to reduce that amount of time are::
8 - 1.3 = 6.7 seconds assuming perfect overlap which is pretty much impossible.
200*9.3 = 1860 -v- 200 * 6.7 = 1340
28% as a target; but achieving it would be very hard.
Doing 2 at a time would be a 50% gain. 4 at a time 75%.
Much better targets and actually pretty close to achievable; but required careful programming to avoid disk thrash.
Adding a single line to my code above:
next unless /$V/;
Can get a 90% savings for some cases:
C:\test>1036737 -V=500 < numbers.tsv Took 19.138550 seconds ## without pre-filter Kept 2005 records C:\test>1036737 -V=500 < numbers.tsv Took 1.755853 seconds ## with pre-filter Kept 2005 records
But that saving is negated and actually worse for less specific searches:
C:\test>1036737 -V=5 < numbers.tsv Took 18.765492 seconds ## Without pre-filter Kept 1944 records C:\test>1036737 -V=5 < numbers.tsv Took 20.232294 seconds ## With pre-filter Kept 1944 records
|
|---|