in reply to Re^3: Encoding/decoding question
in thread Encoding/decoding question
Character codes 128 to 159 (U+0080 to U+009F) are not allowed in HTML; even if they were, they would likely be unprintable control characters. Tidy assumed you wanted to refer to a character with the same byte value in the specified encoding and replaced that reference with the Unicode equivalent.
Here's the very top of the original (pre-tidy'd) HTML file (from our friend the facebook)
<!DOCTYPE HTML> <html class=" videoCallEnabled" id="facebook" lang="en"><head> <meta http-equiv="content-type" content="text/html; charset=UTF-8"> <meta charset="utf-8"><script>CavalryLogger=false;window._script_path = "\/home.php";window._EagleEyeSeed="Nq0j";</script><noscript> <meta http-equiv="refresh" content="0; URL=/?_fb_noscript=1" /> </noscript> <meta name="robots" content="noodp,noydir">
... followed by loads of scripts and stylesheets.
The output from your command above, run on the html file, produced thousands of characters such as:
3c6c696e6b2068726566...
Not sure if you're looking for anything in particular. Thanks for your help, Scott
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^5: Encoding/decoding question
by tchrist (Pilgrim) on Sep 12, 2011 at 15:51 UTC | |
by slugger415 (Monk) on Sep 12, 2011 at 20:20 UTC | |
by Anonymous Monk on Sep 12, 2011 at 20:27 UTC | |
by slugger415 (Monk) on Sep 12, 2011 at 21:34 UTC | |
by slugger415 (Monk) on Sep 12, 2011 at 20:30 UTC | |
by tchrist (Pilgrim) on Sep 12, 2011 at 21:23 UTC | |
by slugger415 (Monk) on Sep 12, 2011 at 21:46 UTC | |
by tchrist (Pilgrim) on Sep 13, 2011 at 00:46 UTC |