bagyi has asked for the wisdom of the Perl Monks concerning the following question:

I am reading a binary file which is tag-length-value format. So the problem is after getting endianness of the file, the different types of records are parsed by dispatching corresponding subroutines.
sub record_foo # parse foo type record sub record_bar # parse bar type record sub record_boo # parse boo type record sub record_xxx { if($endian eq 'little') { unpack("v3A*" ...); } elsif ($endian eq 'big') { unpack("n3A*"...); }

I would like to avoid these test of endianness in each of sub-routines while maintaining clarity of code and ease to maintain.

So far, the only idea I have came up - create 2 tables of unpack template, one for big and one for little and select once after determining endianness.

Replies are listed 'Best First'.
Re: Avoid run-time checking
by Athanasius (Archbishop) on Sep 12, 2015 at 07:48 UTC

    Hello bagyi,

    This sounds like a textbook example of a problem waiting for an OO solution. Make a Record class containing a factory-pattern constructor along with the various methods applicable to a record. Then create a subclass for each record type — e.g. Record_Big_Endian and Record_Little_Endian — which inherit from the parent Record class. Override Record methods in these subclasses if and when necessary. This will give you an API which can be used without explicit reference to the endianness of the records being processed:

    use Record; ... my $endianness = get_first_record($file); # or better: put this into + the Record::new class method my $record = Record::new($endianness); # returns either a Record_ +Little_Endian or a Record_Big_Endian object $record->unpack(); # unpacks the record in a +way that is endian-transparent ...

    Hope that helps,

    Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

      Hi Athanasius,

      OO approach looks over-kill to me. since unpacking template is only a small part of parsing a record.

      Refactoring this into two different classes would be creating more duplication of code. since the rest of logic is the same.(handling missing field, and what not..;)

        Refactoring this into two different classes would be creating more duplication of code. since the rest of logic is the same.(handling missing field, and what not..;)

        So make 3 classes , they don't have to OO :) MyRecs; MyRecs::Big; MyRecs::Lil;

        sub MyRecs::foo { my( $ro, $sham, $bo ) = @_; die "sham must sham " if not $sham =~ /sham/i; ... } sub MyRecs::Lil::foo { MyRecs::foo( unpack "n3A*", ... ); } sub MyRecs::Big::foo { MyRecs::foo( unpack "v3A*", ... ); }
Re: Avoid run-time checking
by Anonymous Monk on Sep 12, 2015 at 07:10 UTC

    Are the files ever mixed endianness? Where does $endian come from , how is endianness determined?

    So far, the only idea I have came up - create 2 tables of unpack template, one for big and one for little and select once after determining endianness.

    That can make sense, you've identified a repetitive copy/paste pattern and you're refactoring/abstracting away the repetitiveness, so next step is to actually make two modules with same API, so you can  use MyRecs -bigendian; and you get in your program a record_xxx() which is an alias for MyRecs::BigEndian::record_xxx()

      >>Are the files ever mixed endianness? Where does $endian come from , how is endianness determined?

      No. The endianness is fixed. it is specified in first record of file. Using table approach seems viable, but I don't like to separate subroutine and template string in different places if possible.

        Using table approach seems viable, but I don't like to separate subroutine and template string in different places if possible.

        :) All approaches are viable, it will always come down to your prejudice/bias/choice :D

        so type up a few 1-3 subs of each and then compare/contrast what you like about each

        my %templates; $templates{big}{foo} = 'v3A*'; $templates{lil}{foo} = 'n3A*'; $templates{big}{bar} = 'v2A*'; $templates{lil}{bar} = 'n2A*'; sub record_foo { my( $end, ... ) = ; my @res = unpack $templates{$end}{foo}, ...; ...; } sub record_bar { my( $end, ... ) = ; my @res = unpack $templates{$end}{foo}, ...; ...; } %templates = ( big => { bar => "v2A*", foo => "v3A*", }, lil => { bar => "n2A*", foo => "n3A*", }, ); sub record_foo { my( $big, ... ) = ; my @res = unpack $big? "v3A*" : "n3A*", ...; ...; } sub record_bar { my( $big, ... ) = ; my @res = unpack $big? "v2A*" : "n2A*", ...; ...; }
Re: Avoid run-time checking
by Laurent_R (Canon) on Sep 12, 2015 at 09:01 UTC
    Perhaps you could use simply a dispatch table which will tell you which subroutine reference to call repeatedly depending on what you've found in the first record. (You could also tamper with the symbol table, but that's seems a bit of an overkill.)

    Or you might simply store your unpack template string into a variable. I've never tried that, but I think this should work without any problem. Then you just assign that variable in accordance with what you've found in the first record.