I'm building a rss feed with Perl. Despite asking people to paste only text, they often copy and paste from Word or other word processors and I end up with bad characters. I have been switching a lot of characters, but inevitably there is a character I don't catch. There seem to be some bullets that I can't catch and it makes the feed choke. Interestingly, if I print the feed to a flat file it can be opened in a browser even with the bad characters, but if I print it dynamically from Perl the browser shows an error. I guess my first question is why would that be?
Here are a few of the characters I'm switching out. How do I find the binary code for these bullets that are pasted into the form so I can escape them?
These are the bullets when copied out of the text, but using this code doesn't work:$text =~ s/&//gi; $text =~ s/Æ//gi; $text =~ s/ì//gi; $text =~ s/î//gi; $text =~ s/\n//eg; $text =~ s/\r//eg; $text =~ s/í/\'/gi; $text =~ s/-/ /gi; $text =~ s/\x9s/-/gi; $text =~ s/\x96/-/gi; $text =~ s/\x95/\<li\>/gi; $text =~ s/’/'/gi; $text =~ s/\?/?/gi; $text =~ s/0xa0/ /gi;
$text =~ s/·//g; $text =~ s/o//g;
In reply to Stripping bad characters in rss by htmanning
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |