Since you are using notepad, it's likely that the file is just plain text (nothing freaky like ms-word doc, excel spreadsheet, or other hybrid binary/text thing), and as suggested above the "empty boxes" can be either "control" characters, or "real" characters that happen to be outside the range covered by whatever font notepad is using.
To get a picture of the byte values in the file (to see what might be causing those empty boxes), you could just do this:
#!/usr/bin/perl
while (<>) {
chomp;
$c{$_}++ for (split //);
}
printf("%02x : %s : %d\n",ord($_),$_,$c{$_}) for(sort keys %c);
If you run that script on your text file and save the output to some other file, like this:
perl that_script < your_file.txt > char_list.txt
you can then look at the "char_list.txt" file to see which hex byte values occur in the data and show up as empty boxes in notepad.
If the file happens to be utf8 unicode, you might try this other tool, which I posted here a while back: unichist -- count/summarize characters in data
Run it like this:
perl unichist -x < your_file.txt > char_list.txt
and look at that output with notepad. (Actually, you'll want to modify the "unichist" script so that it does print "\x{feff}\n"; before doing anything else -- this will put the "byte-order-mark" (BOM) character at the start of the output file, which will tell notepad to treat the file as utf8 data.)
Once you know what byte/character values are causing the empty boxes, you'll be able to decide how to fix or remove them. |