Using hex notation for octets and characters is just "better" than octal (or decimal), IMHO -- more consistent, less confusing, easier to understand and keep track of.

BTW, if your web scraping, etc is really giving you strings that contain � (a.k.a. "\x{fffd}", the unicode "replacement" character), this would be a symptom of something gone wrong, either in what the content provider (web service) is giving you, or else in what you are doing with the data once you get it.

That character is used when there is a conversion from some non-unicode encoding into unicode (or from one style of unicode to another, e.g. UTF16 to UTF8), but the input data contained a byte (or byte sequence) that is "unmappable" (unknown or invalid) for the stated input encoding.

Also, it could be worrisome that your various attempts to "visualize" the data yielded just "0xFD". If the input really contained �, I would expect to see either a three-byte utf8 sequence ("\xEF\xBF\xBD"), or a two-byte utf16 sequence ("\xFF\xFD" or "\xFD\xFF", depending on whether the data was big- or little-endian).

(update: OTOH, if the original data contains just "\xFD", and that's what you see in a hex dump of the original data, then you'll want to know what the content provider "means" by that value -- i.e. what character encoding they are using -- and make sure you interpret/decode it correctly. The "\x{fffd}" could be the result of one of your processes trying to convert "\xFD" to unicode the wrong way.)


In reply to Re^4: hexdump/od/perl question by graff
in thread hexdump/od/perl question by EvanCarroll

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.