in reply to Parse::RecDescent grammar for RTF

I don't know of an RTF grammar for Parse::RecDescent. RTF is a bit of a mess, structurally, so parsing isn't trivial. Even major applications write RTF that isn't quite standard.

I, too, have been burnt by the RTF parser on CPAN (RTF::Parser); it nearly does what I wanted, but is exceedingly hard to customize. Basically, if RTF::Parser doesn't do exactly what you want out of the box (and it can; its HTML output is pretty cool), look elsewhere. Ths speaks the voice of bitter experience.

A low-level solution which works for me is RTF::Tokenizer, on which I've based a production system for converting RTF dictionary data to Quark XPress tags. RTF::Tokenizer has its quirks; give me a yell if you need help.

--
$,="\n";foreach(split('',"\3\3\3c>\0>c\177cc\0~c~``\0cc\177cc")) {$a++;$_=unpack('B8',$_);tr,01,\40#,;$b[$a%6].=$_};print@b,"\n"