alexharv074 has asked for the wisdom of the Perl Monks concerning the following question:

Hi all I have a finicky little problem in a perl program I've written and have no idea how to go about solving it.
My program asks the user to enter text in either English or Simplified Chinese via a line that just says
$response = <STDIN>;
I have the following statements at the beginning of my program:
use utf8; use open ':encoding(utf8)'; binmode(STDOUT, ':utf8'); binmode(STDIN, ':utf8');
(Do I really understand what these lines do? No. :)
Anyhow, just about everything works the way I want it to except for one little problem: if you type a Chinese character and then try to delete it again using the delete or backspace key, only half of it gets deleted.
Does anyone know a solution to this?
The platform is Mac OS X although I'd hoped my app would work on linux too.

Replies are listed 'Best First'.
Re: Deleting Chinese characters while reading
by hippo (Archbishop) on Feb 28, 2015 at 11:39 UTC
    Do I really understand what these lines do? No.

    use uft8;
    use open ':encoding(utf8)';
    binmode(STDOUT, ':utf8'); (and the same for STDIN)

    However, if all of this is new to you it's probably best to start with perlunitut.

    As to your problem of deleting multi-byte characters: does the same problem happen in your terminal when not using perl? If so, that's a problem with your terminal settings and needs to be solved there.

      No, it's fine deleting the characters in the terminal, and also in other applications like vim, and indeed here in Safari. I do note that I have the same problem deleting characters if I simply type 'read c' from the command line.
Re: Deleting Chinese characters while reading
by Your Mother (Archbishop) on Mar 01, 2015 at 07:36 UTC

    It sounds like all you could possibly mean is the prompt where you type, before you hit enter, is where this is happening. If so, I can’t see how it has nothing anything to do with Perl. On OS X you can open the prefs for the Terminal with ⌘, then click the Advanced tab and make sure you have UTF-8 or something sane set. Mmmm… I just tested and found it to do what you describe, sort of.

    moo@cow~>perl
    while ( <STDIN> ) { print "Got: $_" }
    ^D
    假借字 # <- pasted three ideograms
    Got: 假借字
    假借  # <- pasted same three and hit delete
    Got: 假借
    假借  # <- pasted same three and hit delete twice
    Got: 假
    

    So characters before the first appear to take two deletes to delete, with a phony/floating whitespace appearing between each contiguous characters, but they are actually deleted the first time, just erroneously displayed. This isn’t Perl’s fault, pretty sure, but some quirk in the Terminal. I would also like to know how to prevent it though I never saw it before. I messed around with switches -CSD too with no change in behavior.

      Yep - you have definitely reproduced the issue.

        FTR, I tried all the Chinese encodings and played with several terminal settings and could only get worse behavior, never better, e.g., after one delete–

        Got: 假借�
Re: Deleting Chinese characters while reading
by graff (Chancellor) on Mar 01, 2015 at 11:14 UTC
    It seems like there are a few things going wrong with how Mac's Terminal app interacts with Perl's use of STDIN while typing and deleting Chinese characters via the keyboard:

    (1) The "delete" key seems to operate on bytes; there are 3 bytes per utf8 Chinese character, so it actually takes three hits on "delete" to fully remove one character from the line that will be read on STDIN when you hit "enter".

    (2) Unfortunately, each hit the "delete key also causes the cursor to move leftward one (Latin/ASCII) character cell on the display.

    (3) As soon as a left-moving cursor moves into the span of a Chinese character (which occupies the width of two standard Latin/ASCII characters), the whole Chinese character disappears.

    So, let's suppose you've typed two Chinese characters, and you want to delete the second one. The two characters occupy the same (horizontal) space as 4 Latin/ASCII characters. You have to hit "delete" 3 times to get rid of all three utf8 bytes for that one character, but as a result, the cursor moves over three spaces, causing both characters to disappear from view (even though the first character is still fully intact in the typing buffer - hit enter, and perl will read it.

    I tried looking into Term::ReadLine, but that doesn't seem to play well with utf8, let alone any sort of method for keyboarding Chinese characters.

    I wish I had a better answer for you (I hope some other monk will have one), but failing that, I'd rely on a browser for non-latin keyboard input to a perl process - that is, get your user input via an http-like interface.

      Actually this isn't quite what I observe:

      If I enter the sequence of characters/keystrokes 中文<delete><delete> I will see 中 with the cursor positioned immediately after it. If I hit <delete> again, however, the cursor doesn't move now. (That is there's no way of removing the phantom 中 character from the line.) This is just a screen display issue, however. If I hit <enter> now, the value read in by the perl program will be an empty string.

      Or if I enter a sequence of characters/keystrokes 中文很难<delete><delete><delete><delete> I will see 中文.

      If I enter a sequence of characters/keystrokes 中文<delete> I will see 中 whereas the cursor will be positioned one ASCII character to the right of 中 and if I hit enter at that point, 中 will be read in without a space.