Re^2: Suggestions to make this code more Perlish

Replies are listed 'Best First'.
Re^3: Suggestions to make this code more Perlish by kcott (Archbishop) on Apr 01, 2014 at 08:33 UTC
Probably the first thing to note is that the numeric value of 7-bit ASCII characters is the same as the UTF-8 code points for the same Unicode characters. The ASCII character "`A`" has the hexidecimal value `41`; The UTF-8 character "`A`" has the hexidecimal value `41`. The term Unicode is often used in a sense to indicate characters outside the range of 7-bit ASCII characters. This often degenerates into arguments over what was said, what was meant, what's techinically correct and so on. For the remainder of this node, assume ASCII refers to 7-bit ASCII characters and Unicode refers to UTF-8 characters outside the range of 7-bit ASCII characters. In the simplest case, if your Perl source code, input data and output data contain only ASCII characters, there's no need to do anything special. This was the case with your code and data here: ASCII characters were used throughout so no special pragmata or encoding directives were required. [For the following examples, note that the letter `A` has a numerical value of `65` decimal (`41` hexidecimal) and the smiley face character `☺` has a numerical value of `9786` decimal (`263a` hexidecimal).] Here's a short piece of Perl code with just ASCII characters: $ perl -E 'say ord "A"; say sprintf "%x", ord "A"' 65 41 Here's a similar piece of Perl code but this also includes Unicode characters: $ perl -E 'say ord "☺"; say sprintf "%x", ord "☺"' 226 e2 As you can see, that second example didn't work very well: it produced unexpected results. Because the source code contained Unicode characters, you need to tell Perl this with the utf8 pragma: $ perl -E 'use utf8; say ord "☺"; say sprintf "%x", ord "☺"' 9786 263a Here's a short piece of Perl code which outputs ASCII characters: $ perl -E 'say chr 65; say "\x{41}"' A A Here's a similar piece of Perl code which outputs Unicode characters: $ perl -E 'say chr 9786; say "\x{263a}"' Wide character in say at -e line 1. ☺ Wide character in say at -e line 1. ☺ As you can see, that produced warnings; however, if we let Perl know to expect Unicode output with binmode, we get a better result: $ perl -E 'binmode STDOUT => ":utf8"; say chr 9786; say "\x{263a}"' ☺ ☺ Those were just trivial examples. See the documentation for details. I suggest you start with: Perl Unicode Introduction Perl Unicode Tutorial Perl Unicode FAQ And then move on to: Perl Unicode Support Perl Unicode Properties [Note: In order to display Unicode characters exactly as coded or output, I've used `<pre>...</pre>` and `<tt>...</tt>` tags instead of `<code>...</code>` and `<c>...</c>` tags.] -- Ken	[reply] [d/l] [select]

Replies are listed 'Best First'.

Re^3: Suggestions to make this code more Perlish
by kcott (Archbishop) on Apr 01, 2014 at 08:33 UTC

Probably the first thing to note is that the numeric value of 7-bit ASCII characters is the same as the UTF-8 code points for the same Unicode characters. The ASCII character "A" has the hexidecimal value 41; The UTF-8 character "A" has the hexidecimal value 41.

The term Unicode is often used in a sense to indicate characters outside the range of 7-bit ASCII characters. This often degenerates into arguments over what was said, what was meant, what's techinically correct and so on. For the remainder of this node, assume ASCII refers to 7-bit ASCII characters and Unicode refers to UTF-8 characters outside the range of 7-bit ASCII characters.

In the simplest case, if your Perl source code, input data and output data contain only ASCII characters, there's no need to do anything special. This was the case with your code and data here: ASCII characters were used throughout so no special pragmata or encoding directives were required.

[For the following examples, note that the letter A has a numerical value of 65 decimal (41 hexidecimal) and the smiley face character ☺ has a numerical value of 9786 decimal (263a hexidecimal).]

Here's a short piece of Perl code with just ASCII characters:

$ perl -E 'say ord "A"; say sprintf "%x", ord "A"'
65
41

Here's a similar piece of Perl code but this also includes Unicode characters:

$ perl -E 'say ord "☺"; say sprintf "%x", ord "☺"'
226
e2

As you can see, that second example didn't work very well: it produced unexpected results. Because the source code contained Unicode characters, you need to tell Perl this with the utf8 pragma:

$ perl -E 'use utf8; say ord "☺"; say sprintf "%x", ord "☺"'
9786
263a

Here's a short piece of Perl code which outputs ASCII characters:

$ perl -E 'say chr 65; say "\x{41}"'
A
A

Here's a similar piece of Perl code which outputs Unicode characters:

$ perl -E 'say chr 9786; say "\x{263a}"'
Wide character in say at -e line 1.
☺
Wide character in say at -e line 1.
☺

As you can see, that produced warnings; however, if we let Perl know to expect Unicode output with binmode, we get a better result:

$ perl -E 'binmode STDOUT => ":utf8"; say chr 9786; say "\x{263a}"'
☺
☺

Those were just trivial examples. See the documentation for details. I suggest you start with:

And then move on to:

[Note: In order to display Unicode characters exactly as coded or output, I've used <pre>...</pre> and <tt>...</tt> tags instead of <code>...</code> and <c>...</c> tags.]

-- Ken

[reply]
[d/l]
[select]