Character codes 128 to 159 (U+0080 to U+009F) are not allowed in HTML; even if they were, they would likely be unprintable control characters. Tidy assumed you wanted to refer to a character with the same byte value in the specified encoding and replaced that reference with the Unicode equivalent.
Here's the very top of the original (pre-tidy'd) HTML file (from our friend the facebook)
<!DOCTYPE HTML> <html class=" videoCallEnabled" id="facebook" lang="en"><head> <meta http-equiv="content-type" content="text/html; charset=UTF-8"> <meta charset="utf-8"><script>CavalryLogger=false;window._script_path = "\/home.php";window._EagleEyeSeed="Nq0j";</script><noscript> <meta http-equiv="refresh" content="0; URL=/?_fb_noscript=1" /> </noscript> <meta name="robots" content="noodp,noydir">
... followed by loads of scripts and stylesheets.
The output from your command above, run on the html file, produced thousands of characters such as:
3c6c696e6b2068726566...
Not sure if you're looking for anything in particular. Thanks for your help, Scott
In reply to Re^4: Encoding/decoding question
by slugger415
in thread Encoding/decoding question
by slugger415
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |