cruftectomy has asked for the wisdom of the Perl Monks concerning the following question:

I need to parse some streams of unicode text. I'd like to use Parse::RecDescent. My specific problem is I am having trouble getting Parse::RecDescent to accept characters in the form '\N{...}'. Here's an example:
use strict; use charnames ':full'; use Parse::RecDescent; $::RD_TRACE = 1; $::RD_HINT = 1; my $parser = new Parse::RecDescent(<<'EOG') || die; ex1: "\N{DIAMOND OPERATOR}" {warn 'got it'} # fails despite 'use char +names' ex2: "\x{22c4}" {warn 'got it'} # always works ex3: /\p{Letter}/ {warn 'got letter'} # fails unless 'use charnames.. +.' EOG

The hint offered by the parser states 'Constant(\N{...}): $^H{charnames} is not defined'.

It's interesting to note that the line 'ex3' fails if charnames isn't used, so P::RD seems to sort-of know about about charnames.

Alternatively, if anyone has dealt with unicode in P::RD grammars, I'd like to hear your opinions and experiences (hopefully not too grim).

Replies are listed 'Best First'.
Re: Parse::RecDescent and unicode
by ikegami (Patriarch) on Sep 14, 2005 at 17:05 UTC

    Parse::RecDescent produces a string containing Perl code which is evaled in a different package (namespace). You need to use charnames in that package:

    use strict; use warnings; use Parse::RecDescent (); #$::RD_TRACE = 1; $::RD_HINT = 1; my $parser = new Parse::RecDescent(<<'EOG') || die; { # Put pragmas, modules and functions # used by the parser in these curlies. use strict; use warnings; use charnames ':full'; } ex1: "\N{DIAMOND OPERATOR}" ex2: "\x{22c4}" ex3: /\p{Letter}/ EOG my $text = chr(0x22C4); print("$_: ", $parser->$_($text) ? "match" : "no match", "\n") foreach qw( ex1 ex2 ex3 ); __END__ output ====== ex1: match ex2: match ex3: no match

    Note: \p{Letter} didn't work for me when using ActivePerl 5.6.1 -- Can't find unicode character property definition via main->Letter or Letter.pl. -- but it worked (as shown above) when using ActivePerl 5.8.0.