Just as soon as ikegami made me aware that us-ascii is my default encoding, I seem to be developing problems with it. These problems are in the variety of my computer not behaving the way I expect it to.
I'm running ubuntu with bash, and when I touch a file into existence, it is us-ascii. Likewise, files that are formed from redirecting STDOUT begin their lives as us-ascii on this platform. Where is this determined on POSIX systems?
So this is a day in the life, where I use this nifty software: translate shell
$ trans :de -brief "over" >2.ascii.de.txt $ trans :de -brief "He must." >>2.ascii.de.txt $ cat 2.ascii.de.txt Über Er muss. $ iconv -f us-ascii -t UTF-8 2.ascii.de.txt -o 2.de.utf8.txt iconv: illegal input sequence at position 0 $
On STDOUT for me, I get Ü as the zeroth character. Does ascii have a representation for Ü?
Then I keep trying to get an iconv command to do something for me, but an effective syntax eludes me. Why is Ü illegal in the iconv command?
If I'm going to have source that has utf8 characters in it, doesn't it make sense to change the underlying encoding to utf8 or create it that way from the git-go?
After I've touched a file into existence, I use a bash script to clone the next version of a script. All of my scripts have a taxonomy of a positive integer followed by a period, followed by a word. The cloned script is incremented, given execute privileges, and has its name written to a manifest. There isn't any language in it for determining the underlying encoding. I've gotten a lot of mileage out of this script, but I think it's time that I need to replace it with shiny new, lexical perl. I'll put it in readmore tags for being somewhat OT:
$ cat 2.create.bash #!/bin/bash # which bash version? echo "The shebang is specifying bash" if [ -z "${BASH_VERSION}" ]; then echo "Not using bash but dash" else echo "Using bash ${BASH_VERSION}" fi #get the the first number from $1 #c=$(("$1" : '\([0-9]*\).*$')) didn't work c=$(expr "$1" : '\([0-9]*\).*$') echo $c f=$1 #integer addition d=$(expr $c + 1) echo $d #munge new file, no clobber t="$d" q=${f#*.} s=$t.$q echo $s cp -n $f $s chmod +x $s echo $s >> 1.manifest ls -lh $s gedit $s & $
I'd like to write a perl equivalent that would give me freedom to choose the underlying encoding. I'd show previous attempts, but they look awful.
Finally, what makes any of these en_**.utf8 encodings different from another?
$ locale charmap UTF-8 $ locale -a C C.UTF-8 en_AG en_AG.utf8 en_AU.utf8 en_BW.utf8 en_CA.utf8 en_DK.utf8 en_GB.utf8 en_HK.utf8 en_IE.utf8 en_IL en_IL.utf8 en_IN en_IN.utf8 en_NG en_NG.utf8 en_NZ.utf8 en_PH.utf8 en_SG.utf8 en_US.utf8 en_ZA.utf8 en_ZM en_ZM.utf8 en_ZW.utf8 POSIX ru_RU.utf8 ru_UA.utf8 $ locale -m ANSI_X3.110-1983 ANSI_X3.4-1968 ... UTF-8 VIDEOTEX-SUPPL VISCII WIN-SAMI-2 WINDOWS-31J $
Thanks for your comment
In reply to create clone script for utf8 encoding by Aldebaran
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |