in reply to SOLVED: Storing UTF-8 data into database from scraped web page

Hello nysus,

Try to Encode the data before you store them. It should work. Sample of code:

#!/usr/bin/env perl use Encode; use strict; use warnings; print encode_utf8("<p>What\x{2019}s up with the water ??</p>"), $/; __END__ $ perl test.pl <p>What’s up with the water ??</p>

Hope this helps, BR

Seeking for Perl wisdom...on the process of learning...not there...yet!

Replies are listed 'Best First'.
Re^2: Storing UTF-8 data into database from scraped web page
by nysus (Parson) on Jun 14, 2018 at 20:46 UTC

    Looks like things were obfuscated a bit by Dumper. So the web page is in UTF-8 and the apostrophe is created by &#x2019 on the web page.

    $PM = "Perl Monk's";
    $MCF = "Most Clueless Friar Abbot Bishop Pontiff Deacon Curate Priest";
    $nysus = $PM . ' ' . $MCF;
    Click here if you love Perl Monks

      Hello nysus,

      Did you see the updated sample of code? This did not worked for you?

      I just run one more last test with HTML::Entities module and both cases either including the HTML entity or the the code tag worked.

      See sample of code:

      BR / Thanos

      Seeking for Perl wisdom...on the process of learning...not there...yet!
Re^2: Storing UTF-8 data into database from scraped web page
by nysus (Parson) on Jun 14, 2018 at 19:03 UTC

    Yeah, I tried that. It just makes things even uglier. Output from mysql looks like this: <p><p>Whatâ  s up with the water ??</p>

    Output when I dump the scraped content looks like this: <p>Whatâ~@~Ys up with the water ??</p>

    $PM = "Perl Monk's";
    $MCF = "Most Clueless Friar Abbot Bishop Pontiff Deacon Curate Priest";
    $nysus = $PM . ' ' . $MCF;
    Click here if you love Perl Monks

      Hello again nysus,

      I just tried that on my local DB and it works. I found the module Text::Unidecode it should do what you need.

      #!/usr/bin/perl use utf8; use strict; use warnings; use Text::Unidecode; my $encode = unidecode("What\x{2019}s up with the water ??"); print $encode . "\n"; __END__ $ perl test.pl What's up with the water ??

      Update: Not to forget you need to define also the column in your table as:

      `Column` VARCHAR(150) CHARACTER SET utf8 NOT NULL UNIQUE,

      You do not the parameter NULL UNIQUE I just usually add them on my columns so I will avoid duplications etc.

      Update2: Sample of the whole code that I tested:

      The conf.ini file:

      Hope this helps, BR.

      Seeking for Perl wisdom...on the process of learning...not there...yet!