in reply to Re: german Alphabet
in thread german Alphabet
I don't see in ikegami's script the need for use utf8;.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^3: german Alphabet
by haukex (Archbishop) on Dec 04, 2018 at 21:02 UTC | |
I don't see in ikegami's script the need for use utf8;. The OP as well as ikegami's script contain the string 'Fräsen und ndk (Kamera - Fräsaufnahme)'. From utf8: "The use utf8 pragma tells the Perl parser to allow UTF-8 in the program text in the current lexical scope. ... Do not use this pragma for anything else than telling Perl that your script is written in UTF-8. ... Because it is not possible to reliably tell UTF-8 from native 8 bit encodings, you need either a Byte Order Mark at the beginning of your source code, or use utf8;, to instruct perl." Although the "ä" may
Updated as per ikegami's reply. | [reply] [d/l] [select] |
by ikegami (Patriarch) on Dec 05, 2018 at 09:19 UTC | |
Perl assumes ASCII, not latin-1.
If you happen to use an 8-bit byte in string literal, a character with the value of the byte will be created rather than throwing an error. | [reply] [d/l] |
by Anonymous Monk on Dec 15, 2018 at 19:51 UTC | |
"ê" is decoded into characters but then printed to a handle that doesn't have an :encode(...) or :utf8 IOLayer. Since it's representable in latin-1, the single-byte encoding is used and no warning is shown. $ perl -w -Mutf8 -E'print "ы"' | hd Wide character in print at -e line 1. 00000000 d1 8b |..| 00000002Similar situation, but "ы" cannot be represented in latin-1, so we get a warning and UTF-8 bytes instead. (My terminal is UTF-8. No decoding or encoding is done in this case, Perl operates on bytes.) | [reply] [d/l] [select] |
by ikegami (Patriarch) on Dec 16, 2018 at 19:53 UTC | |
by Anonymous Monk on Dec 16, 2018 at 21:47 UTC | |
by Aldebaran (Curate) on Dec 07, 2018 at 23:44 UTC | |
I have found this to be a very informative thread and ikegami's comments illuminating. Some issues require comment with attending source, so that others can replicate. I have enjoyed replicating haukex's source and wicked use of the command line to clone a script with a use statement commented out. That said, I don't understand current output. I use my clone tool on haukex's script to get a filename in my nomenclature:
I then use his nifty shell command:
The original is unable to render the special charcters in STDOUT. Uncertain what happens in code tags:
BUT, (this part is surprising to me), the umlauts are legible in STDOUT for the version with use utf8 commented out. They will probably get shredded in code tags:
I tried to switch the encoding to us-ascii using a command similar to what you used but fail to find the correct syntax:
Also, I'm not sure what I'm to be gleaning from Devel::Peek. Is the idea that you get to see what perl's internal representation of a string is? | [reply] [d/l] [select] |
by ikegami (Patriarch) on Dec 08, 2018 at 06:14 UTC | |
You get to see Perl's internal representations of scalars and its "subclasses" (arrays, hashes, globs, etc). See illguts for documentation on these. (Grab the tarball and look at the files named index-*.html or illguts-*.pdf.) The transcoding failure is the result of "ö", "ü" and "ß" not being in the US-ASCII character set. | [reply] [d/l] [select] |
by Aldebaran (Curate) on Dec 12, 2018 at 00:27 UTC | |
I finally realized what that was when I took a look inside the unexpected partial file that resulted from this command: 1.us-ascii.pl:
So, position 72 was where the first umlaut occurred, and now I at least understand the error. | [reply] [d/l] [select] |