In the past couple of days I'm quite busy with Perl processing of binary files woes. You can read all about it here and here. In short, though, I need to treat a binary file (or any file for that matter - the notion of "binary file" confuses many people) as a stream of bits. My smallest quanta of information is bits, not bytes - i.e. interesting data may transcend the limits of bytes.

It is clear to me that on the top level, I want to work with strings of "1"s and "0"s, and to use Perl's strengths on them - regexes, string-processing and such. What wasn't clear to me is how I was going to get this nice representation from a rudimentary file.

A short intermission: I'm not a great fan of OOP, in the sense that it's not a panacea, but rather a YADM (Yet Another Design Methodology). That's because I think it's too hyped. I don't believe thinking of everything as objects. I believe in "the right tool for the job". Where OO seems fit, it can be a great and clean solution. Where it doesn't fit, enforcing it on a design will only make it messy and unnatural.

So I started coding. A function would open the file, unpack() it to "00111100...."s and happily use it. But I quickly understood that this reading and unpacking should be abstracted away in another function. I also understood that what I really need is a "stream" abstraction - just ask for N bits, get them and work on them, ask for the next M bits, et cetera. But this clearly starts to separate into two tasks: opening the file, giving away bits. So I tried two subs with shared static data. Frankly, Perl sucks static-variable-wise. They're not directly supported, and must be hacked-over with my() in a BEGIN block. It didn't feel right.

This brought me to consider a package. Vars static to packages are supported nicely, and Perl is very good with packages, so why not... So, I happily tucked the code into a package, and just called BitStream::create($filename) and BitStream::get_bits(128).

Hey, but I might need more than one stream simultaneously... ahh... objects. Why not treat BitStream as an object, which stores all the info it needs in a %self hash, and does what I need ? In fact, this was very simple. I must admit that Perl implements OO nicely, at least when simple things are done. So...

use BitStream; ... ... my $stream = BitStream::new("bigbinfile"); ... my $bitstr = BitStream::get_bits(128);

---------------- UPDATE ----------------
The code posted above is incorrect (copy paste from an old file). It should be -

use BitStream; ... ... my $stream = BitStream::new("bigbinfile"); ... my $bitstr = $stream->get_bits(128);

---------------- UPDATE ----------------

get_bits() now nicely handles incomplete reads, returning just as much as there was left in the stream.

So, it now looks like this:

                           -------------
             get_bits()    |           |
script   <================ | BitStream |   <==    file 
                           |           |
                           -------------

Simple, works, clean, b-e-a-u-t-i-f-u-l (forgive my geeky self).

But the best part is ahead:

If you're read about my binary woes, you know there's also a memory problem. Files can get very big (100s of Megs), so a clever implementation must be thought of. But, the caller of get_bits() doesn't care ! He gets his bits, whatever BitStream does under the hood.

As a matter of fact, there are 3 implementations I consider now:

At the moment, the first solution is employed, as it's the simplest. But whatever solution is chosen, the caller just calls get_bits() !

I will add features to BitStream on a need-basis (being the user of yourself rocks). Features like seek()ing and tell()ing a stream, rewind(), reading backwards, etc.

But the moral if the story, to relate to the article title, is: use the best tool for the job, employ techniques on a need-basis. Then, you'll have the most elegant solution for each problem possibly with different design methodologies, but what does it matter, as long as it "feels" clean and robust.

Replies are listed 'Best First'.
Re: evolving an OO solution for a bitstream
by demerphq (Chancellor) on Oct 21, 2003 at 15:57 UTC

    Frankly, Perl sucks static-variable-wise. They're not directly supported, and must be hacked-over with my() in a BEGIN block. It didn't feel right.

    Bah. Heresy. Statics in modules done need a BEGIN block, they dont even need an enclosing scope unless you are being pedantic. And they can be shared amongst any arbitrary set of procedures. So Id say they are pretty powerful actually.

    Incidentally you wrote this:

    my $stream = BitStream::new("bigbinfile");

    Thats not a method call. Thats a procedure call.

    Anyway, I whipped this together before I realized that you had already gone down this road. (Really must read nodes a bit more thoroughly before replying :-)

    package File::Bitstream; use strict; use warnings; sub new { my ($class,$file)=@_; open my $fh,"<",$file or die "Cant read '$file':$!"; binmode $fh; return bless { file=>$file, fh=>$fh, buffer=>'', chunk=>1024 },$cl +ass; } sub get_bits { my ($self,$bits)=@_; while (!eof($self->{fh}) and length($self->{buffer})<$bits) { my $chars=''; read($self->{fh},$chars,$self->{chunk}); $self->{buffer}.=unpack "B*",$chars; } return length($self->{buffer}) ? substr($self->{buffer},0,$bits,'' +) : undef; } 1; my $o=File::Bitstream->new($0); my $bits=''; print $bits,$/ while defined($bits=$o->get_bits(13));

    ---
    demerphq

      First they ignore you, then they laugh at you, then they fight you, then you win.
      -- Gandhi


      Well, for me shoving procedures into special blocks just for the sake of static variables seems a bit cumbersome.

      You said:

      my $stream = BitStream::new("bigbinfile");
      Thats not a method call. Thats a procedure call.

      I must be missing something. Why isn't Bitstream::new a good way of constructing an object ? Later, I use $stream->get_bits, which is a method call.

        Let's say that BitStream inherits from Stream::Generic. If you call BitStream's new() your way, you cannot do my $self = $class->SUPER::new(@_); to have some intialization deferred to the parent. Better is my $stream = BitStream->new("bigbinfile"); - much easier to extend, now. Plus, you could do something like:
        # This way of dealing with meg and gig is a poor way, used only for de +monstration. # Supersearch for a better way. my $KB = 1024; my $MB = 1024 * $KB; my $GB = 1024 * $MB; my %Classes = ( $MB => 'BitStream::Vec', $GB => 'BitStream::Buffered', ); my $filesize = -s $filename; my $classname = 'BitStream::InMemory'; foreach my $min_size (sort { $a <=> $b } keys %Classes) { last unless $filesize >= $min_size; $classname = $classes{$min_size}; } my $stream = $classname->new($filename);

        That way, you can choose your BitStream::* class based on the size of your file. If it's under a meg, use the instream. Between a meg and a gig, use the hybrid vec option. Over a gig, you need to use the slow buffered method.

        ------
        We are the carpenters and bricklayers of the Information Age.

        The idea is a little like C++ templates, except not quite so brain-meltingly complicated. -- TheDamian, Exegesis 6

        ... strings and arrays will suffice. As they are easily available as native data types in any sane language, ... - blokhead, speaking on evolutionary algorithms

        Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified.

        Well, for me shoving procedures into special blocks just for the sake of static variables seems a bit cumbersome.

        As compared to what? And does whatever you would compare against allow for statics to be shared amongst a variety of subs? Using the nesting feature of blocks you can set up all kinds of data relationships between subs. (However I admit that I have a Pascal background and nesting subroutines comes naturally.)

        Why isn't Bitstream::new a good way of constructing an object ?

        Because the implementor of Bitsteam might just go and reorganize eveything so that Bitstream doesnt have its own new, but rather inherits it from some other class, perhaps a File::Stream or more realistically IO::File. And then all of a sudden your code breaks. The point is that

        Package::subroutine("is a procedure call which doesn't search \@ISA"); Package->method("is a method call which does search \@ISA");

        I mean, you never know what your other personality is going to do when you aren't looking.
        :-)


        ---
        demerphq

          First they ignore you, then they laugh at you, then they fight you, then you win.
          -- Gandhi


        That should be BitStream->new( 'bigbinfile' ). Now your new() method can be subclassed and is a class method and not just a procedural function.

Re: evolving an OO solution for a bitstream
by Abigail-II (Bishop) on Oct 21, 2003 at 10:31 UTC
    use BitStream; ... ... my $stream = BitStream::new("bigbinfile"); ... my $bitstr = BitStream::get_bits(128);

    I don't see any OO here. You're not even calling class methods. It's just plain sub calling.

    Abigail

      Sorry, I pasted incorrect code. It should be -
      use BitStream; ... ... my $stream = BitStream::new("bigbinfile"); ... my $bitstr = $stream->get_bits(128);
      What is OO for you ? Is OO all-of-the-features together, or can we refer to some OO features as OO. Is OO only what uses polymorphism ? I don't think so. True, poly- is one of the most interesting features of OO, but there can be OO without it as well.

      Why I think my code is OO ?

      I have the notion of a BitStream object, which has some internal state and provides some services. This internal state is conveniently encapsulated from the user. The user has no idea how it is implemented, and BitStream's implementation can change at any moment preserving the interface. I can have several BitStream objects, they're separate from each other. etc.

        I can have several BitStream objects.
        Oh, sure, and I guess you use a global variable from which object the 128 bits should be taken if you do:
        my $bitstr = BitStream::get_bits(128);

        You might have objects, but your code snippet doesn't suggest you have them.

        This internal state is conveniently encapsulated from the user. The user has no idea how it is implemented, and BitStream's implementation can change at any moment preserving the interface.
        The internal state of a hash is also conveniently encapsulated from the user. That doesn't make that hashes are objects. Users have no idea how hashes are implemented, and the implementation of hashes can change at any moment, preserving the interface (and in fact, they did between 5.8.0 and 5.8.1). That still doesn't make hashes objects.

        Abigail