Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much

High bit character encoding in HTML

by oakbox (Chaplain)
on Apr 21, 2003 at 07:29 UTC ( #251954=perlquestion: print w/replies, xml ) Need Help??

oakbox has asked for the wisdom of the Perl Monks concerning the following question:

Okay, I have some web forms that some of my Dutch customers use. My problem is that they occasionally use high bit characters like ë ï é and í. When I redisplay those pages, the system craps out with control characters A<<x0.

So I think, "Aha! I'll use a module to escape those pesky characters into HTML". So I use HTML::Entities decode on my text field inputs.

My problem is that HTML::Entities also escapes HTML characters that I WANT my text entry people to be able to use. I want them to be able to use <p> and <br> to have some basic formatting control.

HTML::Entities allows you to force only some characters to be encoded and to leave others alone. But there's no easy way to complement that list. In other words, there's no function for 'export everything BUT <> in the incoming string'.

I'm hoping you might be able to save me from creating a whole manual lookup table :)


Replies are listed 'Best First'.
Re: High bit character encoding in HTML
by crenz (Priest) on Apr 21, 2003 at 09:00 UTC

    I had the same problem with German and Chinese pages. I just keep the original input and instead add an appropriate charset header in the HTML head. For the German pages, I use:

    <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">

    For the Chinese pages, I use utf8. I believe all the characters used in Dutch should be in iso-8859-1, so you could just use that one.

      *slap hand to forehead*

      Don't I feel like a silly goose. Due to bad design, I've got a lot of print Content statements peppering that particular script. Sure enough, some of them include the charset=iso-8859-1 and some do not. I popped in the appropriate metatags on all of them and presto, everything works as expected.

      This excersize has also pointed out the need for me to centralize my outputs in a sane place, probably at the module level. ++ to crenz for the splash of cold water in my face! :)


      Hmmm... couldn't you just use utf8 for everything then?

        Yes, you are right. I will be transitioning to UTF-8 as soon as I have the time :) It's actually quite easy, it's just not a priority for me right now. And I recommended iso8859-1 for him, because that's still the standard all tools (Browsers, perl) can deal with.

Re: High bit character encoding in HTML
by PodMaster (Abbot) on Apr 21, 2003 at 08:04 UTC
    Let me give it a shot ;)
    use HTML::Entities; my $crazyhtml = "<p> asdf ".chr(243)." asdf </p>"; die encode_entities($crazyhtml, "[^><]" ); __END__ <p> asdf asdf </p> at - line 3.
    Hmm, doesn't appear to work. Well, the good news is , HTML::Entities also exports "%char2entity and the %entity2char hashes which contain the mapping from all characters to the corresponding entities", so you can write your own.

    update: You know, it wouldn't be a bad idea for the author to bold that sentance in the manual, even if the pod is pretty short.

    MJD says you can't just make shit up and expect the computer to know what you mean, retardo!
    I run a Win32 PPM repository for perl 5.6x+5.8x. I take requests.
    ** The Third rule of perl club is a statement of fact: pod is sexy.

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://251954]
Approved by Enlil
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (3)
As of 2022-07-04 05:45 GMT
Find Nodes?
    Voting Booth?

    No recent polls found