Parse::RecDescent and utf-8

ph0enix has asked for the wisdom of the Perl Monks concerning the following question:

Hi all

I need to parse text data stored in utf-8. I want to use Parse::RecDescent. But because parser code generated by Parse::RecDescent module don't include 'use utf8;' I can't even simply split text to words!

I can use in my grammar regexp like

/[\w夔紸皸賨濎薧鏵鏸𥝲']+/
[download]

to specify what belong to word, but this small example will work only for small subset of languages with LATIN alphabet. And for all other languages...???

Or is it better to modify Parse::RecDescent module to add 'use utf8;' to generated code? Is it safe?

Comment on Parse::RecDescent and utf-8 Download Code

Replies are listed 'Best First'.
Re: Parse::RecDescent and utf-8 by erikharrison (Deacon) on May 16, 2002 at 18:26 UTC
If you can understand Parse::RecDescent then your best bet is probably to try to subclass it before modifying it. If that doesn't work, then try hacking on it. If this is a concern, then why don't you send a line to the author who supposedly is actually at work on Parse::FastDescent. Perhaps you can get utf-8 as an option in the new module. Cheers, Erik Light a man a fire, he's warm for a day. Catch a man on fire, and he's warm for the rest of his life. - Terry Pratchet	[reply]

Replies are listed 'Best First'.

Re: Parse::RecDescent and utf-8
by erikharrison (Deacon) on May 16, 2002 at 18:26 UTC

If you can understand Parse::RecDescent then your best bet is probably to try to subclass it before modifying it. If that doesn't work, then try hacking on it.

If this is a concern, then why don't you send a line to the author who supposedly is actually at work on Parse::FastDescent. Perhaps you can get utf-8 as an option in the new module.

Light a man a fire, he's warm for a day. Catch a man on fire, and he's warm for the rest of his life. - Terry Pratchet

[reply]