Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister

Reading (and parsing) a byte stream

by qbxk (Friar)
on Mar 05, 2006 at 02:35 UTC ( #534551=perlquestion: print w/replies, xml ) Need Help??

qbxk has asked for the wisdom of the Perl Monks concerning the following question:

*Update - I found a solution and have posted the basic code which implements SmackFu's algorithm, written in pure perl, see the replys

I'm trying to read a stream of bytes, sent over http (yes), and read specific bytes out of it. I'm finding, incredulously, that I don't know how to determine the integer value of a byte. Some context:

my $sock = Net::HTTP->new(Host => "") || die $@; $sock->write_request(GET => "bytestream") or die $@; $sock->read_response_headers( ); #i don't need these while(1){ $s->recv($buf, $size); my $n = length($buf); # oh.. i feel like reading byte #4 today. next if $n < 4; # i don't want to talk about this case, ok? my $byte = bytes::substr($buf, 4, 1, undef); #remove it from the stream too print $byte; #not looking like an integer. $byte *= 16; #yields a warning: # Argument "x" isn't numeric ...etc # where "x" is some crazy character }
I have a hunch ord($byte) is what i'm looking for, but the byte i'm reading isn't yielding the value I expect it to, which could be a different problem...

for the more curious, I'm attempting to implement an icecast stream recorder, the stream format is explained here:
i'm storing the metaint from the headers, just not in my example, and yes the "byte" i'm trying to read is that meta length byte - my next question will be how to convert the meta data (string of bytes) into a character string. then writing to disk (and prefiltering) only the mpg frames, none of the metadata.

maybe somebody will recommend other solutions to reimplementing the wheel, which I'm open to, but my requirements are highly specific and I haven't found anything that can meet them all, so here we are.

It's not what you look like, when you're doin' what you’re doin'.
It's what you’re doin' when you’re doin' what you look like you’re doin'!
     - Charles Wright & the Watts 103rd Street Rhythm Band

Replies are listed 'Best First'.
Re: Reading (and parsing) a byte stream
by BrowserUk (Patriarch) on Mar 05, 2006 at 04:06 UTC

    Take a look at the use of the '/' in the documentation for pack, you can also use this format character in unpack.

    Basically, an unpack format of 'C/a', will read the byte value represented by the C, and then use that as the length specifier for the character following the '/'; in this case 'A' for ascii data. As you also need the length of the metadata in order to remove it from the stream, you'll need a template of "a$DATASIZE C X C/A", which will capture the data to the first variable, the length to the second, backup over the length byte and then use it to capture the metadata to the third variable.

    I've used a datasize of 10 and an array to simulate the read in this example. The critical part is exiting the while loop when there is not enough data left to fulfill the data size, then read the next lump and append it to the residual:

    #! perl -slw use strict; use bytes; my $DATASIZE = 10; my @stream = ( "abcdefghij\x04fredabcdefghij\x06barneyabcdefghij\x00abcde", "fghij\x07bam bamabcdefghij\x00abcdefghij" ); my $stream = ''; for ( @stream ) { $stream .= $_; while( length( $stream ) > $DATASIZE) { my( $data, $len, $meta ) = unpack "a$DATASIZE C X C/A", $strea +m; print "\ndata:$data"; print "meta:$meta" if $len; my $trim = $len ? $len+1 : 1; $stream = bytes::substr( $stream, $DATASIZE + $trim ); } } __END__ c:\test>junk data:abcdefghij meta:fred data:abcdefghij meta:barney data:abcdefghij data:abcdefghij meta:bam bam data:abcdefghij

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Reading (and parsing) a byte stream
by ChemBoy (Priest) on Mar 05, 2006 at 02:55 UTC

    I don't see anything on CPAN that would solve this problem, though HTTP::Handle might make it a bit easier. I do see a couple of things that might be your problem, though. First off, do you want byte #4, or the byte at index 4? You've got the latter at the moment (from bytes::substr($buf,4,1,undef)), which is of course what most people would call byte #5.

    I think that's probably the issue you're looking at, because ord usually does what you expect in this case, but if you're parsing binary data, you really should be looking at unpack, which is specifically designed for this task. Assuming you weren't doing anything with the rest of the string, the invocation in this case would be

    my $byteval = unpack "x3C", $buf;

    Good luck!

    If God had meant us to fly, he would *never* have given us the railroads.
        --Michael Flanders

Re: Reading (and parsing) a byte stream
by acid06 (Friar) on Mar 05, 2006 at 03:17 UTC
    I'm shooting in the dark here, but...

    I have a hunch ord($byte) is what i'm looking for, but the byte i'm reading isn't yielding the value I expect it to, which could be a different problem...

    ...sounds to me like a problem related to the "endianness" of the data.

    Instead of using ord(), try using some form of unpack(). The protocol specification should specify the endianness (i.e. if it's big- or little-endian).

    perl -e "print pack('h*', 16369646), scalar reverse $="

      I'm afraid your shot misses—endianness has to do with byte order, not the significance of bits within bytes. So if you have a four-byte number, the most-significant byte may be at the beginning (Big-endian, or $Config{byteorder} eq '4321') or at the end (Little-endian, '1234'), but the values of the bytes themselves don't change.

      If God had meant us to fly, he would *never* have given us the railroads.
          --Michael Flanders

        Not really.
        From the article about Endianness at the Wikipedia:

        Endianness also applies in the numbering of bits within a byte or word. In a consistently big-endian architecture the bits in the word are numbered from the left, bit zero being the most significant bit and bit 7 being the least significant bit in a byte.

        So endianness should have to do with bit order.
        Just because the usual way of packing/unpacking little- or big-endian data in Perl (Network and VAX types) does not follow this pattern it doesn't mean it's not correct.

        perl -e "print pack('h*', 16369646), scalar reverse $="
Re: Reading (and parsing) a byte stream
by ambrus (Abbot) on Mar 05, 2006 at 10:27 UTC

    I have recently posted an example of reading a byte stream from a socket and interpreting it with the unpack function at GPM mouse handling. That one is simple because the records are of a fixed size (28 bytes).

Re: Reading (and parsing) a byte stream
by spiritway (Vicar) on Mar 05, 2006 at 03:43 UTC

    Possibly check out what number you expect, vs. what number you actually get. Take a look at them in binary (not hex), and see whether you're getting the bits reversed. I'm thinking some sort of '-endian' problem, but I'm not sure that fits all the facts... From the link you provided, I'm wondering whether you're simply getting a zero from the metadata.

Re: Reading (and parsing) a byte stream
by qbxk (Friar) on Mar 06, 2006 at 07:23 UTC
    Thanks for all this help!

    I found a solution, using the read function instead of recv() - much simpler coding this way, it's not documented well where this function comes from though. Net::HTTP is the child of several large classes...

    Another confounding issue was that I suppose Net::HTTP::read_response_haders() just isn't reading the headers "right", it's somehow messing with the byte counts. So I read my own headers now. ;P

    Here's a very basic solution. I hope to turn this into a subclass of Net::HTTP, or perhaps instead an instance of IO::Socket::INET and call it Net::Icecast - I'm open to suggestions. You can see also that this nugget is simply of wont for features too:

    #!/usr/bin/perl -w ### Written by qbxk for perlmonks ### It is provided as is with no warranties, express or implied, of a +ny kind. Use posted code at your own risk. $|++; use warnings; use strict; use Net::HTTP; use Data::Dumper; use Carp::Assert; # use constant USER_AGENT => 'WinampMPEG/2.9'; # I got refused by som +e public servers unlessen i done it thar way use constant USER_AGENT => 'Stream-Recorder-0.01'; my %HOST = ( host => '', port => 8000, mount => '/stream' ); use constant DEBUG => 1; sub debug(@) { print STDERR "\n" . join("\n", @_) . "\n"; } sub debug_raw(@) { print STDERR @_; } sub open_connection { my %args = ( host => undef, port => 80, mount => '', user_agent => USER_AGENT, @_ ); die "Need a host name" unless defined($args{host}); $args{mount} =~ s/^\/+//g; my $sock = Net::HTTP->new(Host => $args{host}, PeerPort => $args{po +rt} ) || die $@; $sock->write_request(GET => "/$args{mount}", 'User-Agent' => $args{ +user_agent}, 'Icy-MetaData' => 1) or die $@; # my ($code, $mess, %headers) = $sock->read_response_headers( laxed + => 1 ) my ($code, $mess, %headers); while( <$sock> ) { s/\s*$//g; last if /^\s*$/; if( /^(?:HTTP\/1\.[01]|ICY) ([0-9]+) (.+)$/ ) { ($code, $mess) = ($1 +0, $2); } else { my ($h, $v) = split(/:/); $headers{$h} = $v; } } return ($sock,$code,$mess,%headers); } main: { my ($s,$code, $mess, %headers) = open_connection( %HOST ); debug "$code|$mess\n" . Dumper(\%headers); # TODO: timeout on $s. exit if( $code != 200 ); # scream and shout my ($metaint) = map { (/^icy-metaint$/i && $headers{$_}) or () } ke +ys %headers; assert( $metaint > 0 ); open OUT, '>stream-out.mp3'; binmode OUT; # very important while( 1 ) { my $buf; $s->read($buf, $metaint); print OUT $buf; my ($metadata, $metalen, $metabyte); $s->read($metabyte, 1); $metalen = unpack("C",$metabyte) * 16; if( $metalen > 0) { #We have NEW metadata! JOY $s->read($metadata, $metalen); $metadata = unpack("A$metalen", $metadata); assert( $metadata =~ /Stream/, "Not good metadata!" ); #don't + dump a lot of BS (binary *#$!), just die. debug "$metalen - [$metadata]"; } else { $metadata = ''; debug_raw "-"; } } }
    You'll find a clean, "un-meta"ed mp3 file ever growing, called "stream-out.mp3" in your working directory... i've done enough for one night so that's how it stays.

    It's not what you look like, when you're doin' what you’re doin'.
    It's what you’re doin' when you’re doin' what you look like you’re doin'!
         - Charles Wright & the Watts 103rd Street Rhythm Band

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://534551]
Approved by ChemBoy
Front-paged by sweetblood
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (4)
As of 2022-06-24 23:16 GMT
Find Nodes?
    Voting Booth?
    My most frequent journeys are powered by:

    Results (80 votes). Check out past polls.