ph0enix has asked for the wisdom of the Perl Monks concerning the following question:
Hi all
I need to parse text data stored in utf-8. I want to use Parse::RecDescent. But because parser code generated by Parse::RecDescent module don't include 'use utf8;' I can't even simply split text to words!
I can use in my grammar regexp like
to specify what belong to word, but this small example will work only for small subset of languages with LATIN alphabet. And for all other languages...???/[\wÄÜÖäâáçëéîíöôóôüúß']+/
Or is it better to modify Parse::RecDescent module to add 'use utf8;' to generated code? Is it safe?
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Parse::RecDescent and utf-8
by erikharrison (Deacon) on May 16, 2002 at 18:26 UTC |