I have a text file encoded in UTF16-LE, generated from another script. Reading this text file in, I convert the characters to their hexadecimal value. For some reason spaces and newlines are always showing up as 00 when I convert them to hexadecimal. Why is this not working? Thanks for any help!


#!/usr/local/ActivePerl-5.14/bin/perl -w use feature 'unicode_strings'; use utf8; use warnings; use strict; my ($file) = @ARGV; open DATAFILE, "<:encoding(UTF16-LE)", "$file"; while (<DATAFILE>) { my $string = $_; my @list = unpack( 'A' x length($string), $string ); print $string; foreach (@list) { my $char = $_; my $ordi = ord($char); binmode STDOUT, ":utf8"; print "Character:\t" . $char . "\t" . sprintf( '%2.2x', unpack( 'U0U*', $char ) ) . "\n"; } }
Hexadecimal for first few lines of text file. In the hex the spaces are "20 00":
FFF FE 42 00 52 00 49 00 47 00 41 00 4E 00 44 00 0A 00 44 00 61 00 6D 00 6E 00 2C 00 20 00 74 00 68 00 69 00 73 00 20 00 77 00 65 00 61 00 74 00 68 00 65 00 72 00 20 00 63 00 75 00 74 00 73 00 20 00 72 00 69 00 67 00 68 00 74 00 20 00 74 00 6F 00 20 00 74 00 68 00 65 00 20 00 62 00 6F 00 6E 00 65 00 2E 00 0A 00 57 00 68 00 61 00 74 00 20 00 74 00 68 00 65 00 2E 00 2E 00 2E 00 20 00 3F 00

In reply to Why aren't spaces from a Unicode file converting to Hexadecimal value 20? by Isolder

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.