I'm guessing that you are using some specific window application (or maybe a browser?) to view the data, and the boxes stand for characters that this application is unable to display correctly. The problem is, there might be broad assortment of character values that fall into this "undisplayable" category, and if you just keep using this app the same way to look at the data, you'll never be able to figure out what those characters really are.
That's why many people use one of the various "hex dump" tools on data files, so they can see the actual numeric values of the bytes that make up these undisplayable characters, and figure out what needs to be done once they know what these characters really are (in the binary sense at least, if not in a human-language sense).
Perl itself can be used easily to create a hex dump of the data -- something like:
That will provide the hex codes for all the bytes in your data, along with the ascii characters for the bytes that happen to be in the printable ascii range.#!/usr/bin/perl $/=undef; # turn on slurp mode for reading $_ = <>; # read all data into $_ (from STDIN or @ARGV file( while (length()) { @s = split //, substr( $_, 0, 16 ); print join " ", map { sprintf "%02x", ord($_) } @s; print "\n"; print join " ", map { (/[ -~]/) ? " $_" : '~~' } @s; print "\n\n"; $_ = substr( $_, 16 ); }
Another thing you could do is figure out what character encoding is being assumed by your display application (whatever it is), and then see if you can find out what character encoding is represented in your data file. I expect there's a mismatch between those two encodings, and that is why you are seeing those boxes.
(The problem also would relate to the font that your application is using, because the boxes are the symbol provided by that font to represent code points for which it does not have a displayable character glyph.)
(update: obviously, almut has shown that you did in fact post enough information in order for someone here to tell you what your box characters are... but please be aware that you might encounter some other piece of data (which you haven't posted here yet) that will also show little boxes when you view it this way -- and it's not guaranteed that every box you see will always be a 0x19 byte value.)
In reply to Re: Searching between wierd box characters.
by graff
in thread Searching between wierd box characters.
by Anonymous Monk
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |