ph0enix has asked for the wisdom of the Perl Monks concerning the following question:

Hi all

I need to parse text data stored in utf-8. I want to use Parse::RecDescent. But because parser code generated by Parse::RecDescent module don't include 'use utf8;' I can't even simply split text to words!

I can use in my grammar regexp like

/[\wÄÜÖäâáçëéîíöôóôüúß']+/
to specify what belong to word, but this small example will work only for small subset of languages with LATIN alphabet. And for all other languages...???

Or is it better to modify Parse::RecDescent module to add 'use utf8;' to generated code? Is it safe?

Replies are listed 'Best First'.
Re: Parse::RecDescent and utf-8
by erikharrison (Deacon) on May 16, 2002 at 18:26 UTC

    If you can understand Parse::RecDescent then your best bet is probably to try to subclass it before modifying it. If that doesn't work, then try hacking on it.

    If this is a concern, then why don't you send a line to the author who supposedly is actually at work on Parse::FastDescent. Perhaps you can get utf-8 as an option in the new module.

    Cheers,
    Erik

    Light a man a fire, he's warm for a day. Catch a man on fire, and he's warm for the rest of his life. - Terry Pratchet