shilpam has asked for the wisdom of the Perl Monks concerning the following question:

I am using Text::CSV module to write the results of SQL query to a .csv file. Everything is fine but now I noticed that if data contains ASCII characters like ®, they are not formatted but displayed as &# 174. Whereas when the same data is displayed on browser, it converts &# 174 to ®. How do I ensure that proper formatting of data takes place in the csv file?
  • Comment on ASCII characters not displayed correctly in csv file

Replies are listed 'Best First'.
Re: ASCII characters not displayed correctly in csv file
by davorg (Chancellor) on Jul 14, 2004 at 09:59 UTC

    As Zaxo points out, it looks like your data isn't stored as plain text, but as text with HTML-style entities embedded in it. A browser makes the conversion automatically but you'll need to do it yourself.

    It's also worth pointing out that you need to know which character set you are dealing with as they can all have differing definitions for characters outside of the ASCII set (ASCII only defines characters up to 127 - 174 is not an ASCII character). The ® sign is number 174 in the ISO-8859-1 character set (which is probably the most common ASCII extension) so that may well the character set that you are dealing with.

    --
    <http://www.dave.org.uk>

    "The first rule of Perl club is you do not talk about Perl club."
    -- Chip Salzenberg

      HTML and XML numeric entities are always in Unicode character set. The range from 127 to 255 in Unicode is the same as ISO-8859-1.
Re: ASCII characters not displayed correctly in csv file
by Zaxo (Archbishop) on Jul 14, 2004 at 09:45 UTC

    You should look at what's stored in your DB. Chances are that the entity encoding is put there by whatever client application is creating the records.

    You can use the decode method of HTML::Entities to convert to plain text. You'll need to be careful of character encodings.

    After Compline,
    Zaxo

      The value stored in the database is First Encounter®. I am getting the values from the database and then printing it into a csv file. But, whenever it encounters the above mentioned value, the Perl script gives following error:
      combine() failed on argument: First Encounter®
      Can there be a solution wherein the data which is stored in the database is "First Encounter&# 174". But, when it has to print in the csv file, it converts this data into First Encounter®?
      I used HTML::Entities to convert "First Encounter&# 174" to First Encounter®. But, when I try to print the value in a csv file, it gives me the same (above mentioned) error.
Re: ASCII characters not displayed correctly in csv file
by crabbdean (Pilgrim) on Jul 14, 2004 at 16:16 UTC
    Interesting, I was just reading about this yesterday. I often remind myself that a computer talks in 1's and 0's. How it translates this is then what you see. For example the byte 0010 0011 is always the same as it flyes around a computer but how its translated end to end can APPEAR different. A browser may show it as one character whereas in ASCII it might show as another.

    Here is an example of what I mean, I found this yesterday http://www.ascii.cl/htmlcodes.htm I found another good one yesterday that actually gave it in bits as well but I can't find that now.

    Dean
    The Funkster of Mirth
    Programming these days takes more than a lone avenger with a compiler. - sam
    RFC1149: A Standard for the Transmission of IP Datagrams on Avian Carriers