Re^2: Faster and more efficient way to read a file vertically

> unpack => sub { # Suggested but not implemented by pryrt

Actually unpack was suggested (and not implemented) by me first. ;)

FWIW: My idea was to unpack multiple lines simultaneously instead of going line by line.

If you are interested and all lines really have the same length (the OP never clarified)

read a chunk of complete lines bigger 4 or 8kb (depending on the blocksize of the OS to optimize read operations)
run a repeated unpack pattern
get a list of 1 result for each chunk line

Please see if substr on single lines is still faster then.

$line_length += $newline_length;                 # OS dependend
$line_count   = int(8 * 1024 / $line_length) +1;
$chunk_size   = $line_count * line_length;
[download]

And yes I'm still reluctant to implement it, smells too much like an XY Problem :)

Cheers Rolf
_{(addicted to the Perl Programming Language and ☆☆☆☆ :)

Je suis Charlie!}

update

In hindsight... probably having a slightly smaller chunk is more efficient :

$line_count = int(8 * 1024 / $line_length)

Comment on Re^2: Faster and more efficient way to read a file vertically Select or Download Code

Replies are listed 'Best First'.
Re^3: Faster and more efficient way to read a file vertically by johngg (Canon) on Nov 06, 2017 at 00:36 UTC
Actually unpack was suggested (and not implemented) by me first. ;) Ah! Sorry, I missed that :-/ Cheers, JohnGG	[reply]
Re^4: Faster and more efficient way to read a file vertically by LanX (Saint) on Nov 06, 2017 at 01:13 UTC
>Ah! Sorry, I missed that :- / No problem at all... ;-) I just wanted to point you to the possibility that unpack can process many lines at the same time, (which essentially means "reading vertically"). `use strict; use warnings; my $line_count = 10; my $line = join ("", 'a'..'z',"A".."Z") . "\n"; my $file= "$line" x $line_count; my $offset = 2; my $rest = length($line) - $offset -1; my $fmt = qq{(x${offset}ax${rest})${line_count}}; my @results = unpack $fmt, $file; print @results;` [download] Will print `cccccccccc` because of offset 2 Cheers Rolf _{(addicted to the Perl Programming Language and ☆☆☆☆ :) Je suis Charlie!}	[reply] [d/l] [select]
Re^5: Faster and more efficient way to read a file vertically by johngg (Canon) on Nov 06, 2017 at 17:32 UTC
An excellent suggestion!! Adding this method to the tests and benchmark ... `unpackM => sub { # Multi-line unpack suggested by LanX seek $inFH, 0, 0; my $buffer = <$inFH>; my $lineLen = length $buffer; my $nLines = 500; my $chunkSize = $lineLen * $nLines; seek $inFH, 0, 0; my $retStr; my $fmt = qq{(x${offset}ax@{ [ $lineLen - $offset - 1 ] })}; while ( my $bytesRead = read $inFH, $buffer, $chunkSize ) { $retStr .= join q{}, unpack $fmt, $buffer; } return \ $retStr; },` [download] ... produced a new clear winner. ok 1 - brutish ok 2 - pushAoA ok 3 - regex ok 4 - rsubstr ok 5 - seek ok 6 - split ok 7 - substr ok 8 - unpack ok 9 - unpackM Rate pushAoA brutish split seek regex unpack substr rsubs +tr unpackM pushAoA 1.14/s -- -32% -60% -61% -90% -97% -98% -9 +8% -98% brutish 1.68/s 47% -- -41% -43% -86% -95% -96% -9 +7% -97% split 2.83/s 148% 69% -- -3% -76% -92% -94% -9 +4% -95% seek 2.93/s 157% 75% 4% -- -76% -92% -94% -9 +4% -95% regex 12.0/s 952% 618% 325% 310% -- -66% -75% -7 +5% -80% unpack 35.1/s 2970% 1993% 1140% 1097% 192% -- -27% -2 +8% -43% substr 47.8/s 4081% 2751% 1588% 1530% 297% 36% -- - +3% -22% rsubstr 49.1/s 4193% 2827% 1634% 1574% 308% 40% 3% +-- -20% unpackM 61.1/s 5247% 3546% 2059% 1985% 408% 74% 28% 2 +5% -- 1..9 [download] However, your idea of reading and processing larger chunks of the file led me to consider whether using a mask ANDed with a larger buffer would produce any improvement. Initial attempts using a regex to pull out non-NULL characters after ANDing were not encouraging but using tr instead was much better. This routine ... `ANDmask => sub { # Multi-line AND mask by johngg seek $inFH, 0, 0; my $buffer = <$inFH>; my $lineLen = length $buffer; my $nLines = 500; my $chunkSize = $lineLen $nLines; seek $inFH, 0, 0; my $retStr; my $mask = qq{\x00} x ${offset} . qq{\xff} . qq{\x00} x ( $lineLen - $offset - 1 ); $mask x= $nLines; while ( my $bytesRead = read $inFH, $buffer, $chunkSize ) { ( my $anded = $buffer & $mask ) =~ tr{\x00}{}d; $retStr .= $anded; } return \ $retStr; },` [download] ... seems to produce the best result so far. ok 1 - ANDmask ok 2 - brutish ok 3 - pushAoA ok 4 - regex ok 5 - rsubstr ok 6 - seek ok 7 - split ok 8 - substr ok 9 - unpack ok 10 - unpackM Rate pushAoA brutish split seek regex unpack substr rsubstr + unpackM ANDmask pushAoA 1.11/s -- -35% -61% -62% -91% -97% -98% -98% + -98% -99% brutish 1.71/s 55% -- -39% -41% -86% -95% -96% -96% + -97% -98% split 2.82/s 155% 65% -- -3% -77% -92% -94% -94% + -95% -97% seek 2.91/s 163% 70% 3% -- -76% -92% -94% -94% + -95% -97% regex 12.3/s 1010% 617% 336% 322% -- -65% -74% -75% + -79% -88% unpack 35.0/s 3060% 1943% 1141% 1102% 185% -- -25% -27% + -40% -67% substr 46.9/s 4137% 2638% 1564% 1512% 282% 34% -- -3% + -20% -55% rsubstr 48.2/s 4254% 2714% 1610% 1556% 292% 38% 3% -- + -18% -54% unpackM 58.7/s 5194% 3321% 1979% 1914% 377% 68% 25% 22% + -- -44% ANDmask 105/s 9407% 6045% 3634% 3517% 757% 201% 124% 118% + 80% -- 1..10 [download] I would be interested to know if any Monk can spot flaws in the benchmark? Cheers, JohnGG	[reply] [d/l] [select]
Re^6: Faster and more efficient way to read a file vertically by vr (Curate) on Nov 07, 2017 at 01:40 UTC
Re^7: Faster and more efficient way to read a file vertically by johngg (Canon) on Nov 07, 2017 at 16:39 UTC
Some notes below your chosen depth have not been shown here