in reply to Parse::RecDescent grammar for RTF
I, too, have been burnt by the RTF parser on CPAN (RTF::Parser); it nearly does what I wanted, but is exceedingly hard to customize. Basically, if RTF::Parser doesn't do exactly what you want out of the box (and it can; its HTML output is pretty cool), look elsewhere. Ths speaks the voice of bitter experience.
A low-level solution which works for me is RTF::Tokenizer, on which I've based a production system for converting RTF dictionary data to Quark XPress tags. RTF::Tokenizer has its quirks; give me a yell if you need help.
--
$,="\n";foreach(split('',"\3\3\3c>\0>c\177cc\0~c~``\0cc\177cc"))
{$a++;$_=unpack('B8',$_);tr,01,\40#,;$b[$a%6].=$_};print@b,"\n"
|
|---|