Since you are using notepad, it's likely that the file is just plain text (nothing freaky like ms-word doc, excel spreadsheet, or other hybrid binary/text thing), and as suggested above the "empty boxes" can be either "control" characters, or "real" characters that happen to be outside the range covered by whatever font notepad is using.

To get a picture of the byte values in the file (to see what might be causing those empty boxes), you could just do this:

#!/usr/bin/perl while (<>) { chomp; $c{$_}++ for (split //); } printf("%02x : %s : %d\n",ord($_),$_,$c{$_}) for(sort keys %c);
If you run that script on your text file and save the output to some other file, like this:
perl that_script < your_file.txt > char_list.txt
you can then look at the "char_list.txt" file to see which hex byte values occur in the data and show up as empty boxes in notepad.

If the file happens to be utf8 unicode, you might try this other tool, which I posted here a while back: unichist -- count/summarize characters in data

Run it like this:

perl unichist -x < your_file.txt > char_list.txt
and look at that output with notepad. (Actually, you'll want to modify the "unichist" script so that it does print "\x{feff}\n"; before doing anything else -- this will put the "byte-order-mark" (BOM) character at the start of the output file, which will tell notepad to treat the file as utf8 data.)

Once you know what byte/character values are causing the empty boxes, you'll be able to decide how to fix or remove them.


In reply to Re: noobie control char removal by graff
in thread noobie control char removal by desertman

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.