in reply to Re^4: Faster and more efficient way to read a file vertically
in thread Faster and more efficient way to read a file vertically
An excellent suggestion!!
Adding this method to the tests and benchmark ...
unpackM => sub { # Multi-line unpack suggested by LanX seek $inFH, 0, 0; my $buffer = <$inFH>; my $lineLen = length $buffer; my $nLines = 500; my $chunkSize = $lineLen * $nLines; seek $inFH, 0, 0; my $retStr; my $fmt = qq{(x${offset}ax@{ [ $lineLen - $offset - 1 ] })*}; while ( my $bytesRead = read $inFH, $buffer, $chunkSize ) { $retStr .= join q{}, unpack $fmt, $buffer; } return \ $retStr; },
... produced a new clear winner.
ok 1 - brutish ok 2 - pushAoA ok 3 - regex ok 4 - rsubstr ok 5 - seek ok 6 - split ok 7 - substr ok 8 - unpack ok 9 - unpackM Rate pushAoA brutish split seek regex unpack substr rsubs +tr unpackM pushAoA 1.14/s -- -32% -60% -61% -90% -97% -98% -9 +8% -98% brutish 1.68/s 47% -- -41% -43% -86% -95% -96% -9 +7% -97% split 2.83/s 148% 69% -- -3% -76% -92% -94% -9 +4% -95% seek 2.93/s 157% 75% 4% -- -76% -92% -94% -9 +4% -95% regex 12.0/s 952% 618% 325% 310% -- -66% -75% -7 +5% -80% unpack 35.1/s 2970% 1993% 1140% 1097% 192% -- -27% -2 +8% -43% substr 47.8/s 4081% 2751% 1588% 1530% 297% 36% -- - +3% -22% rsubstr 49.1/s 4193% 2827% 1634% 1574% 308% 40% 3% +-- -20% unpackM 61.1/s 5247% 3546% 2059% 1985% 408% 74% 28% 2 +5% -- 1..9
However, your idea of reading and processing larger chunks of the file led me to consider whether using a mask ANDed with a larger buffer would produce any improvement. Initial attempts using a regex to pull out non-NULL characters after ANDing were not encouraging but using tr instead was much better. This routine ...
ANDmask => sub { # Multi-line AND mask by johngg seek $inFH, 0, 0; my $buffer = <$inFH>; my $lineLen = length $buffer; my $nLines = 500; my $chunkSize = $lineLen * $nLines; seek $inFH, 0, 0; my $retStr; my $mask = qq{\x00} x ${offset} . qq{\xff} . qq{\x00} x ( $lineLen - $offset - 1 ); $mask x= $nLines; while ( my $bytesRead = read $inFH, $buffer, $chunkSize ) { ( my $anded = $buffer & $mask ) =~ tr{\x00}{}d; $retStr .= $anded; } return \ $retStr; },
... seems to produce the best result so far.
ok 1 - ANDmask ok 2 - brutish ok 3 - pushAoA ok 4 - regex ok 5 - rsubstr ok 6 - seek ok 7 - split ok 8 - substr ok 9 - unpack ok 10 - unpackM Rate pushAoA brutish split seek regex unpack substr rsubstr + unpackM ANDmask pushAoA 1.11/s -- -35% -61% -62% -91% -97% -98% -98% + -98% -99% brutish 1.71/s 55% -- -39% -41% -86% -95% -96% -96% + -97% -98% split 2.82/s 155% 65% -- -3% -77% -92% -94% -94% + -95% -97% seek 2.91/s 163% 70% 3% -- -76% -92% -94% -94% + -95% -97% regex 12.3/s 1010% 617% 336% 322% -- -65% -74% -75% + -79% -88% unpack 35.0/s 3060% 1943% 1141% 1102% 185% -- -25% -27% + -40% -67% substr 46.9/s 4137% 2638% 1564% 1512% 282% 34% -- -3% + -20% -55% rsubstr 48.2/s 4254% 2714% 1610% 1556% 292% 38% 3% -- + -18% -54% unpackM 58.7/s 5194% 3321% 1979% 1914% 377% 68% 25% 22% + -- -44% ANDmask 105/s 9407% 6045% 3634% 3517% 757% 201% 124% 118% + 80% -- 1..10
I would be interested to know if any Monk can spot flaws in the benchmark?
Cheers,
JohnGG
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^6: Faster and more efficient way to read a file vertically
by vr (Curate) on Nov 07, 2017 at 01:40 UTC | |
by johngg (Canon) on Nov 07, 2017 at 16:39 UTC | |
by vr (Curate) on Nov 08, 2017 at 19:57 UTC | |
by johngg (Canon) on Nov 09, 2017 at 11:33 UTC | |
by etj (Priest) on May 07, 2022 at 23:19 UTC |