echo5 has asked for the wisdom of the Perl Monks concerning the following question:

I have a simple script that appears to show that some byte in the output is causing Perl to behave strangely.

The script calls a command that coughs up 150 bytes or so of binary data. One stream of returned data behaves as expected. Another stream causes chaos.

The script: $cmd = "/usr/local/bin/mycmd"; print "The cmd to be run is: $cmd \n"; open(CMD, "$cmd |" ) or die "Can't run '$cmd'\n$!\n"; while (<CMD>) { chomp; print "My raw output is: $_ \n"; $dataout = $_; print "My DATAOUT is: $dataout \n"; }

When I run the above script in a "good" scenario I get the below output:

The cmd to be run is: /usr/local/bin/mycmd My raw output is: &#65533; ` /11&#65533;%_$&#65533;%f$&#65533;&c$&#65 +533;%a$&#65533;%\$&c$&^$&#65533;&i$ My DATAOUT is: &#65533; ` /11&#65533;%_$&#65533;%f$&#65533;&c$&#65533; +%a$&#65533;%\$&c$&^$&#65533;&i$

When I run the above script in a "bad" scenario I get the below output:

The cmd to be run is: /usr/local/bin/mycmd My raw output is: My DATAOUT is: My raw output is: &#65533;0]&#65533;c &#65533;&#65533; My DATAOUT is: &#65533;0]&#65533;c &#65533;&#65533;

Ultimately the goal is to ingest the data and process it using unpack but that was failing as $_ didn't contain data is should contain to process. The "bad" behavior above seems to show that there is some sort of "killer byte" being output from mycmd that throws a wrench into things. Below is the good and bad output in hex form via xxd. Is there a byte in there tripping up Perl?

Good data stream: 0000000: 0202 00d0 0000 0000 0000 0000 0100 0000 ................ 0000010: 0100 0000 0100 0000 0100 0000 0100 0000 ................ 0000020: 0100 0000 0100 0000 0500 0000 0500 0000 ................ 0000030: 0500 0000 0500 0000 0500 0020 0100 0000 ........... .... 0000040: 0100 0000 0100 0000 0100 0000 0100 0000 ................ 0000050: 0100 0000 0100 0000 0500 0000 0500 0000 ................ 0000060: 0500 0000 0100 0000 0100 0000 0000 0000 ................ 0000070: 0200 0060 0100 0020 0000 0000 0100 2d00 ...`... ......-. 0000080: 0100 2f00 0100 3000 0000 0000 0100 0000 ../...0......... 0000090: 0000 0000 0102 ca25 0102 4a24 0102 bd25 .......%..J$...% 00000a0: 0102 5024 0102 da25 0102 4c24 0102 d525 ..P$...%..L$...% 00000b0: 0102 4c24 0102 c325 0102 4624 0102 e025 ..L$...%..F$...% 00000c0: 0102 4e24 0102 e225 0102 4824 0102 dd25 ..N$...%..H$...% 00000d0: 0102 5224 ..R$
Bad data stream: 0000000: 020a 009c 0000 0000 0000 0000 0100 0000 ................ 0000010: 0100 0000 0100 0000 0100 0000 0100 0000 ................ 0000020: 0100 0000 0100 0000 0100 0000 0500 0000 ................ 0000030: 0500 0000 0500 0000 0500 0000 0500 0000 ................ 0000040: 0500 0000 0500 0000 0500 0000 0500 0000 ................ 0000050: 0500 0000 0500 0000 0500 0000 0500 0000 ................ 0000060: 0500 0000 0500 0000 0100 0000 0000 0000 ................ 0000070: 0100 3000 0000 0000 0100 005c 0100 00b4 ..0........\.... 0000080: 0000 0000 0100 0000 0000 0000 0200 0063 ...............c 0000090: 0100 0020 0000 0000 0103 9206 0103 8506 ... ............

Replies are listed 'Best First'.
Re: Killer byte tripping up Perl?
by haukex (Archbishop) on Dec 12, 2018 at 22:56 UTC

    If you're reading binary data, you should always binmode the filehandle, or use the three-argument open and specify the :raw layer. Don't use a plain print to show the data, use Data::Dump or Data::Dumper with $Data::Dumper::Useqq=1;. Plus, note that your while (<CMD>) is still going to split the input on $/ (if you haven't set it to anything else), so you might want to look into read to read fixed-size chunks of data ($/ has a special mode for that, but I like read better).

    Update: I see haj and Laurent_R have made similar points too. In particular, don't chomp binary data!

    Another note: I wrote about "safer" piped opens here.

Re: Killer byte tripping up Perl?
by haj (Vicar) on Dec 12, 2018 at 22:51 UTC

    Are you, by any chance, writing your binary data to your terminal? In that case the data might contain some control characters which cause the cursor to jump or other things, and as of 2018 your terminal is likely to expect UTF-8 encoded output. Also note that chomp is probably a bad idea on binary data.

    For better diagnostics, you should apply binmode to your CMD filehandle and write the output to a file and examine this file.

    Edited to add: I just spotted a 0a as the second byte of your bad stream. This is a line feed which will be killed by chomp.
Re: Killer byte tripping up Perl?
by Laurent_R (Canon) on Dec 12, 2018 at 22:55 UTC
    Just a wild guess.

    while (<CMD>)
    reads data line by line assuming that it is a text file with lines separated by new line characters.

    If your mycmd command is producing binary output, then you probably need to read the output as binary input. Maybe you should read blocks of bytes using the read command rather than the readline function or the equivalent <...> operator.

    Also take a look at the binmode function (for input as well as for output).

    Update: fixed the links above to that they lead directly to the relevant Perl documentation pages instead of a search page.

Re: Killer byte tripping up Perl?
by stevieb (Canon) on Dec 12, 2018 at 22:36 UTC

    Can you provide a "good" stream and a "bad" one to run with?

    Hard to test/reproduce otherwise...