s_m_b has asked for the wisdom of the Perl Monks concerning the following question:

I'm creating a page to output rtf formatted files from a database, which requires some reformatting of html and UBBC. What I can't get it to do is convert <br /> etc into \par, without it producing \'08 or \'05 or \'02 etc.

Some examples:

Without switching anything, I get

{\pard \fs20 I drive from east hull to grimston bar\'2e Would anyone be interested in a car share? Also I could collect / drop off along the A1079\'2e

Outward journeys depart Hull approx 06:45

Return journeys depart York approx 17:30 to 18:00 Monday, Tuesday and 16:30 the rest of the week\'2e

\par}

Using $message =~ s~<br />~\par~ig; Gets

{\pard \fs20 I drive from east hull to grimston bar\'2e Would anyone be interested in a car share? Also I could collect / drop off along the A1079\'2eparparOutward journeys depart Hull approx 06:45parparReturn journeys depart York approx 17:30 to 18:00 Monday, Tuesday and 16:30 the rest of the week\'2e par par \par}

using $message =~ s~<br />~\\par~ig; produces

{\pard \fs20 I drive from east hull to grimston bar\'2e Would anyone be interested in a car share? Also I could collect / drop off along the A1079\'2e\'5cpar\'5cparOutward journeys depart Hull approx 06:45\'5cpar\'5cparReturn journeys depart York approx 17:30 to 18:00 Monday, Tuesday and 16:30 the rest of the week\'2e \'5cpar \'5cpar \par}

Am I missing something very obvious here?

Replies are listed 'Best First'.
Re: Formatting rtf - problems
by moritz (Cardinal) on May 13, 2008 at 11:32 UTC
    perl -wle '$_="|<br />|"; s{<br />}{\par}ig; print' Unrecognized escape \p passed through at -e line 1. |par|

    So the first solution certainly is wrong.

    And I don't see how the search-and-replace could insert some weird characters - are you sure they aren't in the string before? Use hexdump or Data::Dumper to find out:

    perl5.10.0 -wle '$_="!<br />!"; s{<br />}{\\par}ig; print'|hexdump -C 00000000 21 5c 70 61 72 21 0a |!\par!.| 00000007

    If you are sure that the regex is misbehaving, please give us a short, executable and self-contained piece of code that demonstrates that behaviour.

      Sorted it - I was trying to be a little too smart. RTF::Writer formats the strings it gets sent, and assumes they are plain text. I was trying to do the inline formatting before giving it to the writer, so it was reformatting - hence the mangled strings!

      By sending it plain old \n instead of \line or \par, it behaves.