Re^4: Working with fixed length files

Replies are listed 'Best First'.
Re^5: Working with fixed length files by BrowserUk (Patriarch) on Apr 28, 2011 at 10:58 UTC
You are benchmarking the code from the original nodes, which as I mentioned, operate on different assumptions. Ike's assumption means the while loop only iterates half as many time as it does for mine. The differences you are measuring are down to that. If you modify Ike's to read one record at a time and operate upon it conditionally (per my benchmark), or modify mine to read and map the pairs of records into a single pre-partitioned buffer thereby removing the need for the if statment in the loop, then you would be comparing like with like. I also tweeked my benchmark code to a) use a fixed size read thereby avoiding the newline search; b) changed the condition of the loop so that I could assign the return from readline directly to the mapped buffer avoiding another copy. This was to ensure that the differences being tested were down to the unpack .versus. substr refs, not the ancilliary details of code written to demonstate the technique, not performance. For more performance, do away with the substr and read directly into the partitioned scalar: #! perl -slw use strict; use Time::HiRes qw[ time ]; my $start = time; my $rec = chr(0) x 123; my @type3l = split ':', '02:10:33:15:19:10:3:18:6:4'; my $n = 0; my @type3o = map{ $n += $_; $n - $_; } @type3l; my @type3 = map \substr( $rec, $type3o[ $_ ], $type3l[ $_ ] ), 0 .. $# +type3o; my @typeOl = split ':', '02:98:11:9'; $n = 0; my @typeOo = map{ $n += $_; $n - $_; } @typeOl; my @typeO = map \substr( $rec, $typeOo[ $_ ], $typeOl[ $_ ] ), 0 .. $# +typeOo; until( eof() ) { read( ARGV, $rec, 123, 0 ); if( $rec =~ /^03/ ) { print join '/', map $$_, @type3; } else { print join '\|', map $$_, @typeO; } } printf STDERR "Took %.3f for $. lines\n", time() - $start; [download] And for ultimate performance, switch to binmode & sysread to avoid Windows crlf layer overhead. But it requires other tweaks also and I'm 21 hours into this day already. But whatever, you do need to be comparing like with like. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l]
Re^6: Working with fixed length files by Tux (Canon) on Apr 28, 2011 at 11:30 UTC
I think all you mention has already been said in the other answers. If I switch to sysread and compare buk with ike1 (reading one line at the time) and ike2 (reading two lines at the time), your method wins a bit but leaves all three methods withing the noise level. Also note how all numbers go up! With these rates, I wonder if I would still use your substr ref method or the `unpack` approach, as I find the latter way easier to read and maintain. If however performance is vital, maybe XS code would squeeze out even more (with pre-bound variables). `Rate ike1 buk ike2 ike1 384977/s -- -2% -3% buk 392677/s 2% -- -1% ike2 394907/s 3% 1% --` [download] Note that if you keep `local $/ = \122;` in ike1, it has a huge influence on performance, even if you'd say that it should not be used: `Rate ike1 buk ike2 ike1 275687/s -- -27% -30% buk 378299/s 37% -- -4% ike2 392677/s 42% 4% --` [download] Read more... (3 kB) Enjoy, Have FUN! H.Merijn	[reply] [d/l] [select]
Re^7: Working with fixed length files (Crap benchmark!) by BrowserUk (Patriarch) on Apr 28, 2011 at 19:33 UTC
The reason all the code runs the same speed is because sysread doesn't work on ramfiles, so the loops are never being entered. That also explains the dramatic slowdown affect of `local $/ = \nnn;`. It adds an operation to a call that does almost nothing, and twice almost nothing is longer than 1 times almost nothing. Which brings up another mystery entitled: "The Strange Case of the Disappearing AutoDie". Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l]
Re^8: Working with fixed length files (Crap benchmark!) by Tux (Canon) on Apr 29, 2011 at 09:46 UTC
Re^8: Working with fixed length files (Crap benchmark!) by Tux (Canon) on Apr 29, 2011 at 11:37 UTC
Re^7: Working with fixed length files by BrowserUk (Patriarch) on Apr 28, 2011 at 11:42 UTC
Note that if you keep local $/ = \122; in ike1, it has a huge* influence on performance,* That's may be because as you haven't used binmode, IO layers are still in force and are checking for the default input delimiter (newlines) even though they are not being used. By setting $/ = \nnn, it stops the input buffer being scanned as it is loaded. (Or something like that. :) I'd expect to see similar changes with $/ = \nnn in the other routines to. I really like your idea of binding unpack templates to an array of aliases to partitions of a buffer. Effectively 'compiling' the template much as /o (used) to compile regexes. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply]
Re^8: Working with fixed length files by Tux (Canon) on Apr 28, 2011 at 11:52 UTC
Re^7: Working with fixed length files by BrowserUk (Patriarch) on Apr 28, 2011 at 18:56 UTC
There is something wrong with this benchmark. I don't know what it is yet, but there is definitely something wrong. Your numbers show, and I get the same results here, that Ike1 & Ike2 run in almost identical time. This, despite that Ike1 loops twice as many times and make twice as many calls per loop to unpack and makes twice as many calls per loop to sysread. So 4 times as many calls to each over all, Ie 8 times as many calls in total! That flies in the face of everything we know about performant Perl code. Sorry, but that simply cannot be true. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply]
Re^5: Working with fixed length files by Anonymous Monk on Apr 28, 2011 at 09:27 UTC
perl version? With your program I get `'x' outside of string in unpack` because of the x2, after removing those, I get `Rate buk ike buk 19.6/s -- -43% ike 34.1/s 74% -- $ perl -e "die $^V" v5.12.2` [download] On 5.008009 I get `Rate buk ike buk 22.7/s -- -35% ike 35.1/s 54% --` [download] `Rate buk ike buk 24.9/s -- -47% ike 47.1/s 89% -- $ ..\perl.exe -e " die $^V" v5.14.0` [download] This is typical win32 mingw/activestate build update: Well you didn't copy buk's code exactly, you omitted `local $/ = \(2 * 122);` which appears critical `5.008009 Rate ike buk ike 35.5/s -- -57% buk 83.1/s 134% -- v5.12.2 Rate ike buk ike 33.6/s -- -55% buk 74.4/s 121% -- v5.14.0 Rate ike buk ike 46.3/s -- -48% buk 88.2/s 91% --` [download]	[reply] [d/l] [select]
Re^6: Working with fixed length files by Tux (Canon) on Apr 28, 2011 at 09:41 UTC
I re-read BrowserUk's post, and I still don't see that line. And yes, I copied it exactly. The x2 error you see is because you didn't add the `\r`'s to the DATA section as I wrote in the introduction line. They get lost when posting code on PM. Adding that line to his code is unfair, as that will skip half of the data. Fair would be to use \122, but that doesn't change much: === base/perl5.8.9 5.008009 i686-linux-64int Rate buk ike buk 66.7/s -- -39% ike 109/s 63% -- === base/tperl5.8.9 5.008009 i686-linux-thread-multi-64int-ld Rate buk ike buk 61.1/s -- -37% ike 96.7/s 58% -- === base/perl5.10.1 5.010001 i686-linux-64int Rate buk ike buk 63.3/s -- -39% ike 104/s 65% -- === base/tperl5.10.1 5.010001 i686-linux-thread-multi-64int-ld Rate buk ike buk 56.1/s -- -37% ike 88.8/s 58% -- === base/perl5.12.2 5.012002 i686-linux-64int Rate buk ike buk 62.5/s -- -41% ike 105/s 69% -- === base/tperl5.12.2 5.012002 i686-linux-thread-multi-64int-ld Rate buk ike buk 54.5/s -- -38% ike 88.4/s 62% -- === base/perl5.14.0 5.014000 i686-linux-64int Rate buk ike buk 60.6/s -- -48% ike 116/s 92% -- === base/tperl5.14.0 5.014000 i686-linux-thread-multi-64int-ld Rate buk ike buk 53.8/s -- -49% ike 105/s 96% -- [download] Enjoy, Have FUN! H.Merijn	[reply] [d/l] [select]
Re^7: Working with fixed length files by Anonymous Monk on Apr 28, 2011 at 09:50 UTC
I re-read BrowserUk's post, and I still don't see that line. And yes, I copied it exactly. Look at his benchmark code, you know, the node with the numbers :) Adding that line to his code is unfair, as that will skip half of the data. Fair would be to use \122, but that doesn't change much: Ok, right, he had \123, the difference is smaller `5.008009 Rate ike buk ike 34.7/s -- -17% buk 41.6/s 20% -- v5.12.2 Rate ike buk ike 32.9/s -- -16% buk 39.4/s 20% -- v5.14.0 Rate buk ike buk 45.5/s -- -1% ike 45.9/s 1% --` [download] Maybe i'll run buk's benchmark now	[reply] [d/l]
Re^8: Working with fixed length files by Tux (Canon) on Apr 28, 2011 at 10:02 UTC
Re^7: Working with fixed length files by Anonymous Monk on Apr 28, 2011 at 09:57 UTC
The x2 error you see is because you didn't add the \r's to the DATA section as I wrote in the introduction line. Right, something about my keyboard lacking an \r key :) My rule of thumb, if data contains special characters, Dumper-it	[reply]

local $/ = \(2 * 122);

`local $/ = \(2 * 122);`