Some (at least 40) of you may remember How much was unpack()ed?. Well, I came up with a solution and wrote a module that creates a streamable unpack(). Furthermore, it makes virtually no assumptions about how unpack() works, which means it should be compatible with future syntaxes! :)

Here's a module that implements it. Please suggest names appropriate for CPAN!

package StreamString; use strict; use warnings; =pod use StreamString; $str = new StreamString('foo bar baz'); $word = $str->unpack('A4'); print "First word is: $word\n"; # First word is: foo $word = $str->unpack('A4'); print "Second word is: $word\n"; # Second word is: bar # Skip forward a byte. $str++; # also $str->seek(1, StreamString::SEEK_CUR) $word = $str->unpack('A4'); print "Third word is: $word\n"; # Third word is: az $str -= 3; $word = $str->unpack('A4'); print "No, wait, it was $word\n"; # No, wait it was $strpos = 0+$str; # god bless overload() $strpos = $str->tell; # if you don't like the previous syntax $bar = $str->unpack_at(4, 'A4'); # seek to 4 before unpacking $bar2 = $str->unpack('@4 A4'); # this works too =cut # not gonna pull in all of Fcntl for three constants use constant { SEEK_SET => 0, SEEK_CUR => 1, SEEK_END => 2 }; use overload fallback => 1, '0+' => 'tell', # calls 'tell' method '+' => sub { $_[0]->tell() + $_[1] }, '=' => 'copy', #sub{$_[0]->new($_[0], pos(${$_[0]} +))}, '+=' => sub { $_[0]->seek($_[1], SEEK_CUR); $_[0] }, + # += and -= are just relative seeks '-=' => sub { $_[0]->seek(-$_[1], SEEK_CUR); $_[0] }, + # += and -= are just relative seeks '""' => sub { ${+shift} }, # stringify 'bool' => sub { ! $_[0]->eof } , # false == end-of- +string; truth == not ; use Carp qw(croak); sub copy : method { # print "copy constructor called\n"; my $this = shift; $this->new($$this, pos($$this)); } sub new : method { my $class = shift; $class = ref($class) if ref($class); my $text = shift || ''; pos($text) = shift || 0; bless \$text, $class; } sub eof : method { my $this = shift; pos($$this) >= length($this) } sub tell : method { pos(${+shift}) } sub seek : method { croak "Not enough arguments to StreamString::seek()" unless @_ >= +3; my ($this, $where, $how) = @_; my $newpos = $how == SEEK_SET ? $where : $how == SEEK_CUR ? $where + pos($$this) : $how == SEEK_END ? length($$this) - $where : croak("Invalid WHENCE specified for StreamString::Se +ek ($how)!"); return pos($$this) = $newpos if $newpos>=0 && $newpos < length($$t +his); # range check return pos($$this); } sub unpack : method { my ($this, $str) = @_; my $p = pos($$this); # this is grossly inefficient. It requires making a (potentially +very large) # copy of the not-yet-unpacked version of the string. If anyone c +an tell me # how to make unpack return the number of bytes it's run through, +OR how to # work the internal APIs behind unpack (unpack_str/unpackstring) t +o make them # do it, please tell me! my @list = CORE::unpack("\@$p $str a*", $$this); # it's very simple. The last thing I return is (a*), which sucks +up the # whole remainder of the string. I figure out how long that is, a +nd from # there I can find out where unpack() stopped reading the data we +actually # wanted. my $remainder = pop @list; pos($$this) = length($$this) - length $remainder; return wantarray ? @list : $list[0]; } sub unpack_at : method { my ($this, $start, $str) = @_; $this->seek($start, SEEK_SET); return $this->unpack($str); } sub main::StreamStringTest { my ($str, $word, $strpos, $bar, $bar2); $str = new StreamString('foo bar baz'); $word = $str->unpack('A4'); print "First word is: '$word'\n"; # First word is: foo $word = $str->unpack('A4'); print "Second word is: '$word'\n"; # Second word is: bar # Skip forward a byte. $str++; $word = $str->unpack('A4'); print "Third word is: '$word'\n"; # Third word is: az $str -= 3; $word = $str->unpack('A4'); print "No, wait, it was '$word'\n"; # No, wait it was baz $strpos = 0+$str; # god bless overload() print "strpos is $strpos\n"; $strpos = $str->tell; # if you don't like the previous syntax print "strpos is $strpos [2]\n"; $bar = $str->unpack_at(4, 'A4'); # seek to 4 before unpacking print "bar is '$bar'\n"; $bar2 = $str->unpack('@4 A4'); # this works too print "bar2 is '$bar2'\n"; } 1;
An interesting thing I discovered when writing this is that the copy constructor is sometimes called unnecessarily (i.e. when only one reference to an object exists and so no copy needs to be made).
--Stevie-O
$"=$,,$_=q>|\p4<6 8p<M/_|<('=> .q>.<4-KI<l|2$<6%s!<qn#F<>;$, .=pack'N*',"@{[unpack'C*',$_] }"for split/</;$_=$,,y[A-Z a-z] {}cd;print lc

In reply to Unpack()ing a stream by Stevie-O

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.