Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:
I'm trying to take a webpage, and encode the entities which show up only in the <body>...</body> portion of the webpage, and ONLY those that are in the "high" entity region. I don't want to encode brackets like < and >, and I don't want to make the final output of the page broken when viewed in a browser or web validator. Basically I want to encode all the umlauts and other "foreign language" entities. I've looked at HTML::Entities, but I'm not sure how to only process those in the body, and those of a specific asciibetical value.
What I've come up with so far is this:
s/([\200-\377])/sprintf "&#%d;", ord $1/ge;
Is there a better way?
|
---|
Replies are listed 'Best First'. | |
---|---|
Re: Encoding entities ONLY in the <body></body> of a webpage
by valdez (Monsignor) on Jun 14, 2003 at 14:04 UTC | |
Re: Encoding entities ONLY in the <body></body> of a webpage
by little (Curate) on Jun 14, 2003 at 13:07 UTC |