in reply to Substituting Newline Characters

You possibly want something like this assuming the goal is to make plain text render with formatting intact as HTML.

sub escapeHTML { local $_ = shift; # make the required escapes s/&/&amp/g; s/"/&quot;/g; s/</&lt;/g; s/>/&gt;/g; # change tabs to 4 spaces s/\t/ /g; # make the whitespace escapes - not required within <pre> tags s/( {2,})/"&nbsp;" x length $1/eg; # make the brower bugfix escapes; s/\x8b/&#139;/g; s/\x9b/&#155;/g; # change newlines to <br> if desired - not required with <pre> s/\n/<br>\n/g; return $_; }

cheers

tachyon

Replies are listed 'Best First'.
Re: Re: Substituting Newline Characters
by Happy-the-monk (Canon) on Mar 15, 2004 at 23:51 UTC

    Compare that to the code from CGI.pm v3.01:

    Cheers Sören

      As it happens i am remarkably familiar with the guts of CGI.pm. I do hope you are not proposing using 5000 lines of CGI.pm for this task? If you are I take it you are aware that outside of a CGI context that it will default to a charset of ISO-8859 aka Latin. You could also note that the /s and /o modifiers on the REs are pointless in context, it does not correctly escape whitespace, and does not deal with \n -> <br> which was at the heart of the original thread.....

      And your point was?

      cheers

      tachyon

        I too am confused what the point might be, especially since your function does more than the CGI.pm ... but do correct me if i am wrong here -- i was under the assumption that memory and speed were so cheap these days that using a mere 5000 lines is not really that bad after all. Back when i was a Comp Sci undergrad, a peer who majored in Industrial Engineering explained to me that there was no need to worry about optimization since hardware was making leaps and bounds. I, of course, scoffed at that, and i still believe that someone had damn better well keep the optimization torch burning because hardware does have a limit ... but the truth is that we are only talking about a few extra seconds at best by using CGI.pm. I just ran Devel::Profile on two scripts, one that imported CGI.pm's escapeHTML and one that used yours. Here are the results:

        Milleage will vary, but unless i am missing something, that's a whoping extra .05 of a second. But at this point, i would use your subroutine because ... well, there it is now isn't. What was that saying ... yes, don't go looking a gift horse in the mouth. That or don't complain when a saint posts code that works, is fast, and is free. ;)

        jeffa

        L-LL-L--L-LL-L--L-LL-L--
        -R--R-RR-R--R-RR-R--R-RR
        B--B--B--B--B--B--B--B--
        H---H---H---H---H---H---
        (the triplet paradiddle with high-hat)
        
Re^2: Substituting Newline Characters
by Anonymous Monk on Mar 16, 2004 at 05:18 UTC

    Would you be capable of giving a quick explanation as to what the 'brower bugfix escapes' are? Just curious as I've never seen that used before.

      It is an old Netscape bug. here is one thread on it Google 'x8b browser bug' or similar if this is not enough.

      cheers

      tachyon