Re^6: Working with fixed length files

I think all you mention has already been said in the other answers. If I switch to

sysread

and compare buk with ike1 (reading one line at the time) and ike2 (reading two lines at the time), your method wins a bit but leaves all three methods withing the noise level. Also note how *all* numbers go up!

With these rates, I wonder if I would still use your substr ref method or the unpack approach, as I find the latter way easier to read and maintain. If however performance is vital, maybe XS code would squeeze out even more (with pre-bound variables).

         Rate ike1  buk ike2
ike1 384977/s   --  -2%  -3%
buk  392677/s   2%   --  -1%
ike2 394907/s   3%   1%   --
[download]

Note that if you keep local $/ = \122; in ike1, it has a *huge* influence on performance, even if you'd say that it should not be used:

         Rate ike1  buk ike2
ike1 275687/s   -- -27% -30%
buk  378299/s  37%   --  -4%
ike2 392677/s  42%   4%   --
[download]

use strict;
use warnings;

use Benchmark qw(cmpthese);

my $data = do { local $/; <DATA> } x 400;
my @data;
my $rec  = chr (0) x 512;

{   # 1. BrowserUk
    my @type3l = split m/:/, "02:10:33:15:19:10:3:18:6:4";
    my $n      = 0;
    my @type3o = map { $n += $_; $n - $_; } @type3l;
    my @type3  = map \substr ($rec, $type3o[$_], $type3l[$_]), 0 .. $#
+type3o;
    my @typeOl = split m/:/, "02:98:11:9";
    $n = 0;
    my @typeOo = map { $n += $_; $n - $_; } @typeOl;
    my @typeO  = map \substr ($rec, $typeOo[$_], $typeOl[$_]), 0 .. $#
+typeOo;


    sub buk
    {
        open my $fh, "<", \$data;
        while (sysread $fh, $rec, 122, 0) {
            @data = map $$_, @type3;
            sysread $fh, $rec, 122, 0;
            @data = map $$_, @typeO;
            }
        } # buk
    }

sub ike2
{
    open my $fh, "<", \$data;
    while (sysread $fh, $rec, 244, 0) {
        @data = unpack
            "A2 A10 A33 A15 A19 A10 A3 A18 A6 A4 x2" .
            "A2 A98 A11 A9 x2",
                $rec;
        }
    } # ike2

sub ike1
{
    # local $/ = \122;
    open my $fh, "<", \$data;
    while (sysread $fh, $rec, 122, 0) {
        @data = unpack "A2 A10 A33 A15 A19 A10 A3 A18 A6 A4 x2", $rec;
        sysread $fh, $rec, 122, 0;
        @data = unpack "A2 A98 A11 A9 x2", $rec;
        }
    } # ike1

cmpthese (-2, {
    ike1 => \&ike1,
    ike2 => \&ike2,
    buk  => \&buk,
    });

__END__
03002068454210482                            000000004204.572011-04-14
+ 19:53:41INTERNET  C  750467375
0214833                                                               
+                              G02042954
03002068703214833                            000000002558.662011-04-15
+ 08:17:19INTERNET  C  761212737
0211561                                                               
+                              05601207284
03002068802911561                            000000001463.702011-04-15
+ 08:40:52INTERNET  C  719807216
029911                                                                
+                              00100275296
[download]

Enjoy, Have FUN! H.Merijn

Comment on Re^6: Working with fixed length files Select or Download Code

Replies are listed 'Best First'.
Re^7: Working with fixed length files (Crap benchmark!) by BrowserUk (Patriarch) on Apr 28, 2011 at 19:33 UTC
The reason all the code runs the same speed is because sysread doesn't work on ramfiles, so the loops are never being entered. That also explains the dramatic slowdown affect of `local $/ = \nnn;`. It adds an operation to a call that does almost nothing, and twice almost nothing is longer than 1 times almost nothing. Which brings up another mystery entitled: "The Strange Case of the Disappearing AutoDie". Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l]
Re^8: Working with fixed length files (Crap benchmark!) by Tux (Canon) on Apr 29, 2011 at 09:46 UTC
`autodie` was removed so I could bench on older perls. Good analisys! That should teach me :/ `sysread` basically means "bypass PerlIO, do a `read ()`", so that means PerlIO::scalar doesn't get a say in it. However whether that is how things should be is a different matter. Enjoy, Have FUN! H.Merijn	[reply] [d/l] [select]
Re^8: Working with fixed length files (Crap benchmark!) by Tux (Canon) on Apr 29, 2011 at 11:37 UTC
With all `sysread`'s re-placed by `read`'s my bench shows much more reliable figures: `Rate buk ike1 ike2 buk 81.1/s -- -43% -46% ike1 143/s 76% -- -5% ike2 150/s 85% 5% --` [download] I think it would be hard to reduce the overhead even more. Read more... (3 kB) Enjoy, Have FUN! H.Merijn	[reply] [d/l] [select]
Re^7: Working with fixed length files by BrowserUk (Patriarch) on Apr 28, 2011 at 11:42 UTC
Note that if you keep local $/ = \122; in ike1, it has a huge* influence on performance,* That's may be because as you haven't used binmode, IO layers are still in force and are checking for the default input delimiter (newlines) even though they are not being used. By setting $/ = \nnn, it stops the input buffer being scanned as it is loaded. (Or something like that. :) I'd expect to see similar changes with $/ = \nnn in the other routines to. I really like your idea of binding unpack templates to an array of aliases to partitions of a buffer. Effectively 'compiling' the template much as /o (used) to compile regexes. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply]
Re^8: Working with fixed length files by Tux (Canon) on Apr 28, 2011 at 11:52 UTC
In a dedicated benchmark to test this, binmode doesn't change anything at all: `with a binmode call: Rate with_rs without with_rs 254857/s -- -34% without 384458/s 51% -- without a binmode call: Rate with_rs without with_rs 260564/s -- -31% without 375127/s 44% --` [download] I've just posted the question to the perl5 porters. Enjoy, Have FUN! H.Merijn	[reply] [d/l]
Re^7: Working with fixed length files by BrowserUk (Patriarch) on Apr 28, 2011 at 18:56 UTC
There is something wrong with this benchmark. I don't know what it is yet, but there is definitely something wrong. Your numbers show, and I get the same results here, that Ike1 & Ike2 run in almost identical time. This, despite that Ike1 loops twice as many times and make twice as many calls per loop to unpack and makes twice as many calls per loop to sysread. So 4 times as many calls to each over all, Ie 8 times as many calls in total! That flies in the face of everything we know about performant Perl code. Sorry, but that simply cannot be true. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply]