- Fairly self-documenting - e.g. using explicit variable names, which can then match the documentation for the file structure
For explicit naming, you'll have to either use my blocks, or hash keys, or use constant. For efficiency and compactness, using constant names defined to indexes in an array is a good option for largish numbers of discrete values:
use constant {
THIS => 0, THAT => 1, THEOTHER => 2, ...
TEMPL1 => 'N A10 S',
};
my @discrete = unpack TEMPL1, read( $file, $size );
print "THIS:", $discrete[ THIS ];
- Compact - no great gobs of identical repeated read() and unpack() statements
For multi-dim arrays, use subroutines:
sub get2DArray {
my( $x, $y, $templ, $templSize, $fh ) = @_;
my @array;
for my $y ( 0 .. $y - 1 ) {
push @array, [ unpack $templ . $x, read( $fh, $templSize * $x
+) ];
}
return \@array;
}
my $array2D = get2DArray( 100, 100, 'N', 4, $fh );
You could use nFor or Loops to write a generic multi-dim array reader, but unless you;re going above 3 or 4 dims, separate subs is probably easier. Watch the iteration order; it's can vary.
- Fast - need to handle 20GB+ of data here.
On my system, using a combination of :perlio on the open & binmode gives me the best reading speed. See Re^2: Perl's poor disk IO performance for details. YMMV.
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] [select] |
An alternative is to your first snippet is to use a hash.
use constant TEMP1 => 'N A10 S';
my %discrete;
@discrete{qw( this that theother )} =
unpack TEMPL1, read( $file, $size );
print "this:", $discrete{ this };
The constants are more typo resistant, though. | [reply] [d/l] |
I did mention hashes along with my blocks of discrete named vars. The main thing I like about the constant method (besides the typo resistance which is good), is that simplicity of in order iteration. Of course you can get that by putting the hash keys into an array, but once you've done that, you're better off using the package stash rather than lexical hash.
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] |
Although there are various parser generator modules for Perl, they are probably not the best option if speed is of paramount importance.
Maybe you could use a somewhat simpler state machine instead, i.e a set of variables that you toggle on and off depending on specific tokens you encounter in the file. They would then indicate what section of the file you're currently in, so you could write something like
if ($in_section_foo) {
if ($in_subsection_bar) {
handle_foo_bar();
}
...
}
...
where handle_foo_bar() would read the appropriate number of bytes (the multi-dimesional array) and unpack them according to the pattern that applies for foo/bar.
It's hard to be more specific without knowing what exact format you're talking about.
| [reply] [d/l] [select] |
A parser probably already exists for that format, so I would go looking for it. | [reply] |