grsampson has asked for the wisdom of the Perl Monks concerning the following question:
I am trying to use Perl to excerpt lines of Chinese poetry from web pages where they are embedded in lots of HTML. According to my copy of the "Programming Perl" book, any version from 5.6 on should deal with Unicode happily -- the Perl on my Mac is many versions later than that. But when I run the script I've written over one of these web pages, where Chinese graphs ("characters") should be printed out I just see question marks. Odder still, there seem to be exactly three question marks per Chinese graph; so far as I know, Unicode uses two bytes per character.
I'm not even sure whether this is a Perl question; I am wondering whether Chinese has been encoded on the web page in some way other than via Unicode. But however it has been encoded, my web browser (Firefox) and my text editor (BBEdit) seem to recognise it fine. I am really at a loss as to how to approach this problem.
I probably should add that my Perl status is probably "intermediate". I have used the language a fair amount, for real tasks rather than just playing, but have never needed to move beyond the core language -- I have never used "pragmas", for instance.
Any advice much appreciated!
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Perl not recognizing Chinese
by choroba (Cardinal) on Sep 19, 2018 at 15:09 UTC | |
|
Re: Perl not recognizing Chinese
by haukex (Archbishop) on Sep 19, 2018 at 18:56 UTC | |
|
Re: Perl not recognizing Chinese
by beech (Parson) on Sep 19, 2018 at 22:23 UTC |