Re^2: split text into words -- Unicode problem (I guess)

@andye Um... no. Simply using space as delimiter would give me "monarchy.", "Japan.", "company?", and so on, not just the words themselves. @dk The text in the variable textBlock doesn't contain "ї" constructs -- it contains the real characters (for example, ț is "T with comma below"). The html encoding changed those characters into "ї" constructs when I submitted them.

Comment on Re^2: split text into words -- Unicode problem (I guess)

Replies are listed 'Best First'.
Re^3: split text into words -- Unicode problem (I guess) by bogdan77 (Initiate) on Mar 29, 2007 at 14:55 UTC
Argh... let me try again... @dk The text in the variable textBlock doesn't contain &#(numbers); constructs -- it contains the real characters (for example, &#five3hree9ine; is "T with comma below"). The html encoding changed those characters into &#(number); constructs when I submitted them.	[reply]
Re^4: split text into words -- Unicode problem (I guess) by Anonymous Monk on Mar 31, 2007 at 10:02 UTC
it works with "use encoding utf8;" thanks a lot, guys :^)	[reply]
Re^4: split text into words -- Unicode problem (I guess) by dk (Chaplain) on Mar 29, 2007 at 15:10 UTC
If your script is written in utf8, `use utf8` is needed to tell Perl about it. See more in utf8.	[reply] [d/l]
Re^4: split text into words -- Unicode problem (I guess) by bogdan77 (Initiate) on Mar 29, 2007 at 15:25 UTC
I used "use utf8;", and still no joy... :-( It seems there's some kind of overlaping, even if the delimiters are in proper order...	[reply]