Macphisto has asked for the wisdom of the Perl Monks concerning the following question:

I'm hacking on a perl script I wrote to create Chinese Flashcards. The only problem is creating the pin yin. For those of you who do not know, PinYin is basically english text with tonal marks and different intonations. There are four tone marks in Chinese PinYin. The tone marks go directly above the letter ( a, e, i, o, u ):

1: A straight horizontal dash.
2: A diagonal dash from right to left moving downward.
3: A "V" like mark.
4: A diagonal dash from left to right moving downward.

I can get tones 2 and 4, but 1 and 3 I cannot supply. I am using the chr() command and just using simple text. I can create these graphically using .gif letters, but that increases overhead. If anyone knows a way to get the other two tone marks please reply. Want a harder challenge? Try getting the tone marks on a letter that has an oummlaut. Thanks in advance, Scott A Runnels

Replies are listed 'Best First'.
Re: Chinese PinYin and PERL
by athomason (Curate) on Jul 12, 2000 at 21:23 UTC
    Sorry, but the ASCII character set doesn't have those characters (you can see what's available with 'perl -e "for (1..255) {print chr}"). Perl 5.6 supports multibyte characters now, so you might look into whether that could help you out. Otherwise, graphic files are probably the way you need to go.
Re: Chinese PinYin and PERL
by Anonymous Monk on Jul 12, 2000 at 22:42 UTC
    This is depends of font, which may or may not have characters you want with your encoding.
    if
    perl -e "print chr for 0..255"
    shows your characters, then you will succeed, otherwise you need to install and use appropriate font.
    Sometimes I do programming for Russia, and such problem is quite common to me.

    Vadim Konovalov

RE: Chinese PinYin and PERL
by Corion (Patriarch) on Jul 13, 2000 at 12:22 UTC

    If you are willing to go with some european charset, these contain some characters like é etc., especially the french language uses them. But for the "v" above chars, you'll have to look into some czech charsets...

    I see three solutions :

    1. You use HTML for your output. There is almost no hassle, as HTML provides you with a large choice of umlauts and stuff. It's not completely trivial, as you want to use "accents" on umlauted letters ...
    2. You use a two-row table and HTML for your output. In the upper table, you use the chars v,^,/,\ to designate the intonation stuff and in the lower row you put the actual letters. Kludgy, but could work.
    3. You use a two-row plain text output. Even more kludgy,but should also work.
    4. If course, there is also always the option of using TeX and a PDFwriter to create PDF output, but that might be too much overhead ;)