As Zaxo points out, it looks like your data isn't stored as plain text, but as text with HTML-style entities embedded in it. A browser makes the conversion automatically but you'll need to do it yourself.
It's also worth pointing out that you need to know which character set you are dealing with as they can all have differing definitions for characters outside of the ASCII set (ASCII only defines characters up to 127 - 174 is not an ASCII character). The ® sign is number 174 in the ISO-8859-1 character set (which is probably the most common ASCII extension) so that may well the character set that you are dealing with.
--
< http://www.dave.org.uk>
"The first rule of Perl club is you do not talk about
Perl club." -- Chip Salzenberg
| [reply] |
HTML and XML numeric entities are always in Unicode character set. The range from 127 to 255 in Unicode is the same as ISO-8859-1.
| [reply] |
You should look at what's stored in your DB. Chances are that the entity encoding is put there by whatever client application is creating the records.
You can use the decode method of HTML::Entities to convert to plain text. You'll need to be careful of character encodings.
| [reply] |
The value stored in the database is First Encounter®.
I am getting the values from the database and then printing it into a csv file. But, whenever it encounters the above mentioned value, the Perl script gives following error:
combine() failed on argument: First Encounter®
Can there be a solution wherein the data which is stored in the database is "First Encounter 174". But, when it has to print in the csv file, it converts this data into First Encounter®?
I used HTML::Entities to convert "First Encounter 174" to First Encounter®. But, when I try to print the value in a csv file, it gives me the same (above mentioned) error.
| [reply] |
Interesting, I was just reading about this yesterday. I often remind myself that a computer talks in 1's and 0's. How it translates this is then what you see. For example the byte 0010 0011 is always the same as it flyes around a computer but how its translated end to end can APPEAR different. A browser may show it as one character whereas in ASCII it might show as another.
Here is an example of what I mean, I found this yesterday http://www.ascii.cl/htmlcodes.htm I found another good one yesterday that actually gave it in bits as well but I can't find that now.
| [reply] |