Re: Unpacking small chucks of data quickly

read does a buffered read (analogous to and possibly implemented in terms of the C fread(3) routine), so even though you're asking for only 8 bytes at a time the underlying read(2) system call isn't going to be called each time.

And probably unrelated speed wise, but your explicitly checking for and allocating an array ref isn't really necessary. Search for "autovivification", but the short answer is just use it in the right place and Perl will have an arrayref there for you.

Not that either of those solves your speed problems . . . :)

Update: and on a second reading, perhaps it's the fact that you're building a structure out of our 100M file in RAM that's the bottleneck. Consider using a hash-on-disk instead of keeping things all in memory.

We're looking for a Perl and Database Developer for Corporate Investments Group.

Comment on Re: Unpacking small chucks of data quickly

Replies are listed 'Best First'.
Re^2: Unpacking small chucks of data quickly by ikegami (Patriarch) on Nov 19, 2007 at 23:10 UTC
perhaps it's the fact that you're building a structure out of our 100M file in RAM that's the bottleneck Quite possible. Data takes much more space as Perl variables than as it does in a file. Especially since hash keys are strings. `use strict; use warnings; use Devel::Size qw( total_size ); my $file = join '', map pack('NN', @$_), ( [ 273, 1234 ], [ 273, 5678 ], [ 274, 1234 ], [ 275, 5678 ], [ 276, 1234 ], [ 277, 5678 ], [ 278, 1234 ], [ 278, 5678 ], ); my %ValArrayByID; while ($file =~ /(.{8})/g) { my ($ID, $Val) = unpack('NN', $1); push @{$ValArrayByID{$ID}}, $Val; } print("File size: ", length($file), " bytes\n"); print("Memory usage: ", total_size(\%ValArrayByID), " bytes\n");` [download] `File size: 64 bytes Memory usage: 922 bytes` [download]	[reply] [d/l] [select]

Replies are listed 'Best First'.

Re^2: Unpacking small chucks of data quickly
by ikegami (Patriarch) on Nov 19, 2007 at 23:10 UTC

perhaps it's the fact that you're building a structure out of our 100M file in RAM that's the bottleneck

Quite possible. Data takes much more space as Perl variables than as it does in a file. Especially since hash keys are strings.

use strict;
use warnings;

use Devel::Size qw( total_size );

my $file = join '', map pack('NN', @$_), (
  [ 273, 1234 ],
  [ 273, 5678 ],
  [ 274, 1234 ],
  [ 275, 5678 ],
  [ 276, 1234 ],
  [ 277, 5678 ],
  [ 278, 1234 ],
  [ 278, 5678 ],
);

my %ValArrayByID;
while ($file =~ /(.{8})/g) {
   my ($ID, $Val) = unpack('NN', $1);
   push @{$ValArrayByID{$ID}}, $Val;
}

print("File size:    ", length($file),              " bytes\n");
print("Memory usage: ", total_size(\%ValArrayByID), " bytes\n");
[download]

File size:    64 bytes
Memory usage: 922 bytes
[download]

[reply]
[d/l]
[select]