in reply to Re^2: split text into words -- Unicode problem (I guess)
in thread split text into words -- Unicode problem (I guess)

Argh... let me try again... @dk The text in the variable textBlock doesn't contain &#(numbers); constructs -- it contains the real characters (for example, &#five3hree9ine; is "T with comma below"). The html encoding changed those characters into &#(number); constructs when I submitted them.
  • Comment on Re^3: split text into words -- Unicode problem (I guess)

Replies are listed 'Best First'.
Re^4: split text into words -- Unicode problem (I guess)
by Anonymous Monk on Mar 31, 2007 at 10:02 UTC
    it works with "use encoding utf8;" thanks a lot, guys :^)
Re^4: split text into words -- Unicode problem (I guess)
by dk (Chaplain) on Mar 29, 2007 at 15:10 UTC
    If your script is written in utf8, use utf8 is needed to tell Perl about it. See more in utf8.
Re^4: split text into words -- Unicode problem (I guess)
by bogdan77 (Initiate) on Mar 29, 2007 at 15:25 UTC
    I used "use utf8;", and still no joy... :-( It seems there's some kind of overlaping, even if the delimiters are in proper order...