thor has asked for the wisdom of the Perl Monks concerning the following question:

Greetings all, I'm trying to insert a NUL (ASCII-0) into a Tk::Text widget with no luck. Here's a code sample:
use strict; use warnings; use Tk; my $mw = MainWindow->new; my $text = $mw->Scrolled("Text")->pack; foreach my $num (0..20) { $text->insert("end", $num. chr($num) . "\n"); } $mw->Button( -text => 'Quit', -command => sub { exit }, )->pack; MainLoop;
From the looks of it, the Tk module is using C's notion of the end of a string (i.e. NUL terminated). Is there any way around this?

thor

Feel the white light, the light within
Be your own disciple, fan the sparks of will
For all of us waiting, your kingdom will come

Replies are listed 'Best First'.
Re: Displaying NUL in a TK::Text widget
by kvale (Monsignor) on Nov 07, 2004 at 04:50 UTC
    Tk is a C library, so I think you must be correct in your assessment. Fortunately, there is a simple workaround:
    use strict; use warnings; use Tk; my $mw = MainWindow->new; my $text = $mw->Scrolled("Text")->pack; foreach my $num (0..20) { $text->insert("end", $num. chr($num)); $text->insert("end", "\n"); } $mw->Button( -text => 'Quit', -command => sub { exit }, )->pack; MainLoop;

    -Mark

      It's not that the NUL is screwing up my other output...I want the NUL. I'm trying to create an editor for some binary data...NULs are likely in that data. I noticed that for the other "non-printing" characters, it put in an escape for the char (i.e. \x{1} for ASCII-1). I wonder by what mechanism that's happening and if it can be extended to include NUL as well...

      thor

      Feel the white light, the light within
      Be your own disciple, fan the sparks of will
      For all of us waiting, your kingdom will come

        I'm curious what Tk would be displaying for characters in the range  [\x7f-\xff] (that is, the ASCII "DEL" code and byte values with the 8th bit set). I'd expect these to be likely in binary data as well.

        If the plan is for a person to use a GUI to edit binary data, I think it would be better for the widget to be displaying some consistent projection of the data into visible characters, rather than pumping the raw binary data directly to the widget.

        For example, you could display the byte stream as a space-separated sequence of two-digit hex numbers (or three-digit octal or even decimal); or a combination of visible ASCII characters plus "escape" or "control" strings like "\n" or "^J"; or maybe use a font that combines "normal" ASCII characters with those nifty two-letter abbreviations for control codes -- Zaxo showed a cool trick using Unicode for this: Printing the Unprintable (but as indicated in that thread, there might not be displayable glyphs for all possible byte values).

        I'm also curious what sort of technique you provide for keyboarding arbitrary binary values (when the user needs to add or change a byte value). Obviously, if you display just space-separated numerics (hex, octal or decimal), the user could just type in digits; or if you're showing things like "^J", accept strings like that for input. You could even offer the users a choice of display/keyboarding methods.

        The main point, though, is that you should have some sort of transform between the binary data in a file and the displayable/editable data in the GUI -- not only to make sure that everything can be seen and typed in, but also to eliminate any possible ambiguities in the display (e.g. space vs. tab vs. LF vs CRLF).

        As graff notes you can't just pump raw binary in an expect it to display in any logical fashion. But given that you like the \x{1} notation why not do something like:

        $_ = "\000japh\njareh\000"; s/([^\040-\177])/sprintf "\\x{%02x}",ord($1)/eg; print;

        Which just hands the chars to Tk as you seem to want them displayed. Personally I would suggest a hex editor format like this typical output which has 3 cols (offset hex ascii )

        File: jargon-4.4.7.tar.gz size = 9061260 bytes 0% [H] Press 'h' +for help 00000000: 1F 8B 08 00 3E 74 F0 3F 00 03 EC FD 07 3C 5C DF ....>t. +?.....<\. 00000010: DF 2F 8A 8F 16 BD D7 28 51 93 E8 8C 3E 44 1B 7D ./..... +(Q...>D.}

        With your solution you have ASCII printable taking 1 char width but non printables taking either 5 or 6. With hex it can always be 2 or decimal/octal 3 chars. You can display the printable ASCII as a separate column.....

        cheers

        tachyon

Re: Displaying NUL in a TK::Text widget
by DrHyde (Prior) on Nov 08, 2004 at 09:54 UTC
    NUL is not a printable character anyway. In binary file viewers (and editors) it is customary to replace all non-printable characters (ie 0x00 to 0x1f and 0x7f to 0xff) with a dot in the character display.
      it is customary to replace all non-printable characters (ie 0x00 to 0x1f and 0x7f to 0xff) with a dot
      In this case, the non-printable characters have semantic meaning. I want to be able to tell the difference between chr(0) and chr(5).

      thor

      Feel the white light, the light within
      Be your own disciple, fan the sparks of will
      For all of us waiting, your kingdom will come

        So how about displaying ".foo." in the button and then when the user does a mouseover, display a tiny window with "\x00foo\x05" or whatever format is most appropriate for your environment?
        --traveler
        You need to then display the numerical values of those characters, such as 0x00, 0x05 and so on. You can't display a character with ASCII value 0x00, because there is no such printable character. Other control characters do weird things, so should never be printed, such as 0x07 (BEL) and 0x0C (FF). Trying to output data containing the special control characters can *really* screw things up.

        You might like to take a look at how such data is displayed by my module Data::Hexdumper. Don't look at the code though, it's old and nasty :-)