htmanning has asked for the wisdom of the Perl Monks concerning the following question:
I'm building a rss feed with Perl. Despite asking people to paste only text, they often copy and paste from Word or other word processors and I end up with bad characters. I have been switching a lot of characters, but inevitably there is a character I don't catch. There seem to be some bullets that I can't catch and it makes the feed choke. Interestingly, if I print the feed to a flat file it can be opened in a browser even with the bad characters, but if I print it dynamically from Perl the browser shows an error. I guess my first question is why would that be?
Here are a few of the characters I'm switching out. How do I find the binary code for these bullets that are pasted into the form so I can escape them?
These are the bullets when copied out of the text, but using this code doesn't work:$text =~ s/&//gi; $text =~ s/Æ//gi; $text =~ s/ì//gi; $text =~ s/î//gi; $text =~ s/\n//eg; $text =~ s/\r//eg; $text =~ s/í/\'/gi; $text =~ s/-/ /gi; $text =~ s/\x9s/-/gi; $text =~ s/\x96/-/gi; $text =~ s/\x95/\<li\>/gi; $text =~ s/’/'/gi; $text =~ s/\?/?/gi; $text =~ s/0xa0/ /gi;
$text =~ s/·//g; $text =~ s/o//g;
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Stripping bad characters in rss
by Corion (Patriarch) on Nov 06, 2022 at 19:53 UTC | |
|
Re: Stripping bad characters in rss
by kcott (Archbishop) on Nov 07, 2022 at 00:10 UTC | |
|
Re: Stripping bad characters in rss
by haukex (Archbishop) on Nov 07, 2022 at 09:28 UTC | |
|
Re: Stripping bad characters in rss
by Anonymous Monk on Nov 07, 2022 at 04:40 UTC |