in reply to Mysteries of unpack("a", ...)

1) I am not at all sure that STDIN can even be opened in binary mode at all! I mean CTL-C, CTL-Z mean things to STDIN although these are certainly valid binary values.

2)If you want to read binary data, open a file in binmode... local $/ = undef; is not needed.

3)Once you are dealing with binary data, study page 758 of "Programming Perl, 3rd edition" very carefully. One thing to be aware of is for example: "big-endian" vs "little-endian" order. Some machines put the most significant 16 bits first of 32 bits and some put it vice-versa. IBM has many machines and some do it one way and some another.

In your situation, look at nN and vV and the other options.

Perl can do binary editing quite well.

I offer one of the very first Perl subs that I wrote. This is from more than a decade ago. I would do some things differently now. But this does slam multiple Windows .wav files of the same type together into a new .wav file. This is just a simple example of binary editing.

sub cat_waves(@){ my ($out, @list)=@_; my $me = whoami(); # $me is like "Util::cat_waves", i.e "this" subr +outine name my $BUFSIZE = 8 * 2**10; myprint ("$me: Creating \"$out\" using ".scalar@list." input \".wav +\" files..."); unless ( (defined ($out)) && (@list >0) ){ showfailed( "$me - Internal Error: not enough arguments supplied +"); return; } # The final output file format will be: # "RIFF" (4 bytes) # riff size (4 bytes) (total bytes that follow from here) # voice format info (42 bytes) #some weird voodoo is in here # "data" (4 bytes) # data size (4 bytes) (voice bytes that follow from here) # all of the voice data (n bytes), must be word aligned. # EOF # 4 byte integers are Intel, VAX "little endian" convention. # Some of the standard company voice edited .wav files contain othe +r # weird segments after the voice data segment. These will be delete +d in # the final concatentated .wav file. my ($new_data_size, @data_file_info) = get_data_sizes(@list); my $new_riff_size = $new_data_size + 42+8; #allow for wave format +info + DATA header myprint ("New File: \"$out\" will be created now..."); myprint (" total file size = ",$new_riff_size + 8); myprint (" RIFF segment size = $new_riff_size"); myprint (" DATA segment size = $new_data_size"); open (OUTBIN, ">$out") || showfailed("$me: unable to open $out"); binmode(OUTBIN) || showfailed("$me: unable to set binmode $out +"); my $buff; # write the new RIFF header to the output file. my $header_file = $list[0]; #"steal" the header from first input f +ile myprint ("Using header info from file: $header_file"); open(INBIN, "<$header_file") || showfailed("$me: unable to open $he +ader_file"); binmode(INBIN) || showfailed("$me: unable to set binm +ode $header_file"); my $n_bytes = read(INBIN, $buff, $BUFSIZE); ($n_bytes > 58) || showfailed("$me: too few bytes ($n_ +bytes) in $header_file"); my $rsize = pack("V4", $new_riff_size); # "V4" means Vax or Intel " +little endian" substr($buff,4,4) = substr($rsize,0,4); my $data_size = pack("V4", $new_data_size); substr($buff,54,4)= substr($data_size,0,4); print OUTBIN substr($buff,0,58); close INBIN || showfailed("$me: unable to close $header +_file"); #now just extract the data from all the .wav files and append to OU +TBIN while (@data_file_info){ my ($wave, $xfer) = @data_file_info; open(INBIN, "<$wave") || showfailed ("$me: unable to open $wave" +); binmode(INBIN) || showfailed ("$me: unable to set binmode + $wave"); myprint ("Appending voice data from $wave..."); my $i_buf_low=58; #don't transfer the initial header stuff in +the RIFF files. while ($xfer>0){ my $n_buf_bytes = read(INBIN, my $buff, $BUFSIZE); my $n = $n_buf_bytes - $i_buf_low; if ($xfer-$n <0 ){$n = $xfer;} #don't xfer past the end of +current data segment. $xfer -= $n; print OUTBIN substr($buff,$i_buf_low,$n); $i_buf_low = 0; } close (INBIN) || showfailed ("$me: unable to close $wave") +; } #end of while (@data_file_info) close(OUTBIN) || showfailed ("$me: unable to close $out"); myprint ("$me: Success \"$out\" has been created!"); return; } #end of cat_waves()

Replies are listed 'Best First'.
Re^2: Mysteries of unpack("a", ...)
by ikegami (Patriarch) on Jan 03, 2009 at 08:50 UTC

    I am not at all sure that STDIN can even be opened in binary mode at all!

    Yes it works the same on STDIN as other handles. Specifically, it disables crlf→lf conversion on Windows machines, it stops treating chr(26) as the end of file on non-PerlIO Windows builds, and does nothing elsewhere.

    I mean CTL-C, CTL-Z mean things to STDIN although these are certainly valid binary values.

    No they don't. They may mean something to the tty/console, but STDIN doesn't even know about the Ctrl key. It doesn't treat character 3 or 26 specially.

    >perl -e"print qq{\x03\x1A}" | perl -le"print uc unpack 'H*', <STDIN>" 031A $perl -e'print qq{\x03\x1A}' | perl -le'print uc unpack "H*", <STDIN>' 031A

    If you want to read binary data, open a file in binmode... local $/ = undef; is not needed.

    Not true at all. $/ is quite useful on binary files.

    my @records = map parse_rec($_), map /(.{$RECSIZE})/sg, do { local $/; <$fh> };

    and

    my @records; local $/ = \$RECSIZE; local *_; while (<$fh>) { push @records, parse_rec($_); }

    are equivalent to

    my @records; local *_; while (read($fh, $_, $RECSIZE)) { push @records, parse_rec($rec); }

    Mind you, read is unaffected by $/, but that has nothing to do with whether the file is binary or not.

      I use STDIN for command line filters of "catable files" (text), eg. cat or "type" in the Windows world can display those files. I stand corrected about use of a binary file for such a purpose.

      I am curious as to what "*_" means? I couldn't find that in my reference books.

      The kind of binary files I usually deal with might have an odd number of bytes and I have to fix it up in the final result with either 16 bit aligned or 32 bit aligned values. Sometimes that means shifting things over a byte or more, So something like:
      my $n_bytes = read(INBIN, $buff, $BUFSIZE); is the ticket. Your mileage may vary as they say! I haven't written any really hairy binary stuff in Perl.

        *_ is the symbol table entry (glob) containing $_ (and @_, %_, etc). Using local on globs is safer than using local on scalars.

        I don't know why you are trying* to prove that read can do something <> can't. I didn't say you shouldn't use read. I didn't say it was useless. I didn't say <> can do everything read can.

        * — You haven't come up with something yet. Keep in mind that length($buff) also returns the number of bytes read.