Nar has asked for the wisdom of the Perl Monks concerning the following question:

Have an error I have been working on for a few days. I am pulling HTML out of a database, it has encoded HTML entities in the output (IE ®).

I then take this data and write it to a CSV file. The error I am getting is when the CSV file is written it converts the ® to �

For troubleshooting I have a print line directly before I write the CSV file, however, this prints ® to the screen.

How can I get it to write ® to the file instead of �?

I've been trying unicode converting and a few similar methods. Out of ideas Monks! Let me know if anyone has any ideas.
#!/usr/local/bin/perl use DBI; use strict; use Spreadsheet::Write; my $h=Spreadsheet::Write->new( file => 'file.csv', encoding => 'iso8859', ); #connect to MySQL my $dbh = DBI->connect(CONNECTION STRING HERE) or die "Can't connect to database\n"; my $sth_select = $dbh->prepare("SELECT id,overview FROM SOME_TABLE"); #EXECUTE SQL $sth_select->execute(); $h->addrow('productcode','overview'); while (my ($id,$overview) = $sth_select->fetchrow_array()) { print "$overview\n"; #PRINT CSV DATA $h->addrow($id,$overview); }

Replies are listed 'Best First'.
Re: HTML entity write error
by roboticus (Chancellor) on Aug 23, 2013 at 02:22 UTC

    Nar:

    If you want a .CSV file, then using Excel is a pretty terrible way to get there. I've had innumerable problems where people think that Excel is a nice data transport medium. It has a terrible habit of changing your data for you. Instead, use something like Text::CSV.

    Even if--after Herculean efforts--you get your file created properly, anyone opening the spreadsheet to look at it could munge it all up as Excel converts the data when it displays. Unless they close without saving, it can break your file.

    I shudder at remembering the six month period where the testers *insisted* on using Excel for a data matching job....

    /me off to the medicine cabinet to get a few Tylenol. I probably ought to wash it down with a fifth of bourbon.

    ...roboticus

    When your only tool is a hammer, all problems look like your thumb.

      What's Excel got to do with it? The OP didn't mention Excel at all.

      Update: 2teez is also apparently talking about Excel. Perhaps the question has been altered at some stage, but the current question doesn't ask about Excel, and the code given doesn't generate an Excel spreadsheet.

      use Moops; class Cow :rw { has name => (default => 'Ermintrude') }; say Cow->new->name"

        tobyink:

        I saw Spreadsheet::Write and my mind immediately went to "Eh? Using a spreadsheet to create a CSV file?" and then the (truly horrible) memories surfaced. So while composing the note, I was thinking that Excel was used. My original point of using a module specifically for the task was on point, while my knee-jerk reaction to using a spreadsheet as a data transport medium was simply an added bonus. ;^)

        ...roboticus

        When your only tool is a hammer, all problems look like your thumb.

        Did you miss use Spreadsheet::Write; in the code at Ln4?

        Oops. Apologies.
        reminder to self: read on down thru the thread (branch) before replying.!

Re: HTML entity write error
by 2teez (Vicar) on Aug 23, 2013 at 02:46 UTC

    Hi Nar,

    ..it has encoded HTML entities in the output (IE ®)...
    Try using HTML::Entities.
    This works for me perl -MHTML::Entities -le 'print decode_entities("&#174")'

    I really don't know why you are writing into an Excel file, while what you needed is a CSV file.
    Try using Text::CSV_XS
    Hope this helps

    If you tell me, I'll forget.
    If you show me, I'll remember.
    if you involve me, I'll understand.
    --- Author unknown to me
      Switched to Text::CSV and works fine. Thanks guys!