Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??
Oh actually... maybe it didn't :/ It works the first time you print it out (directly) ... but then when saved to the DB, it gets corrupt (even though the DB is in utf_bin)

Any other ideas? Who would have thought this would be such a royal PITA! I wish we just all moved over to one charset ;) Here is what a dumper of the values looks like:
$VAR1 = { 'title_flag' => 0, 'og_desc' => "\x{420}\x{443}\x{43c}\x{44b}\x{43d}\x{438}\x{4 +4f} \x{43d}\x{435} \x{441}\x{43c}\x{43e}\x{436}\x{435}\x{442} \x{43f} +\x{43e}\x{43a}\x{430}\x{437}\x{44b}\x{432}\x{430}\x{442}\x{44c} \x{44 +2}\x{435}\x{43b}\x{435}\x{432}\x{438}\x{437}\x{438}\x{43e}\x{43d}\x{4 +3d}\x{44b}\x{439} \x{43a}\x{43e}\x{43d}\x{43a}\x{443}\x{440}\x{441} \ +x{ab}\x{415}\x{432}\x{440}\x{43e}\x{432}\x{438}\x{434}\x{435}\x{43d}\ +x{438}\x{435}-2016\x{bb}, \x{43f}\x{435}\x{432}\x{435}\x{446} \x{41e} +\x{432}\x{438}\x{434}\x{438}\x{443} \x{410}\x{43d}\x{442}\x{43e}\x{43 +d} \x{43d}\x{435} \x{432}\x{44b}\x{441}\x{442}\x{443}\x{43f}\x{438}\x +{442} \x{432} \x{421}\x{442}\x{43e}\x{43a}\x{433}\x{43e}\x{43b}\x{44c +}\x{43c}\x{435}, \x{430} \x{440}\x{443}\x{43c}\x{44b}\x{43d}\x{441}\x +{43a}\x{438}\x{435} \x{442}\x{435}\x{43b}\x{435}\x{437}\x{440}\x{438} +\x{442}\x{435}\x{43b}\x{438} \x{43d}\x{435} \x{441}\x{43c}\x{43e}\x{4 +33}\x{443}\x{442} \x{43f}\x{440}\x{43e}\x{433}\x{43e}\x{43b}\x{43e}\x +{441}\x{43e}\x{432}\x{430}\x{442}\x{44c} \x{437}\x{430} \x{43f}\x{43e +}\x{43d}\x{440}\x{430}\x{432}\x{438}\x{432}\x{448}\x{438}\x{445}\x{44 +1}\x{44f} \x{43c}\x{443}\x{437}\x{44b}\x{43a}\x{430}\x{43d}\x{442}\x{ +43e}\x{432} \x{2014} \x{438}\x{437}-\x{437}\x{430} \x{434}\x{43e}\x{4 +3b}\x{433}\x{430} \x{432} 16 \x{43c}\x{43b}\x{43d} \x{448}\x{432}\x{4 +35}\x{439}\x{446}\x{430}\x{440}\x{441}\x{43a}\x{438}\x{445} \x{444}\x +{440}\x{430}\x{43d}\x{43a}\x{43e}\x{432}.", 'title' => " \x{420}\x{443}\x{43c}\x{44b}\x{43d}\x{438}\x{44 +f} \x{43d}\x{435} \x{431}\x{443}\x{434}\x{435}\x{442} \x{443}\x{447}\ +x{430}\x{441}\x{442}\x{432}\x{43e}\x{432}\x{430}\x{442}\x{44c} \x{432 +} \x{ab}\x{415}\x{432}\x{440}\x{43e}\x{432}\x{438}\x{434}\x{435}\x{43 +d}\x{438}\x{438}-2016\x{bb} \x{438}\x{437}-\x{437}\x{430} \x{434}\x{4 +35}\x{43d}\x{435}\x{433} - \x{413}\x{430}\x{437}\x{435}\x{442}\x{430} +.Ru ", 'charset' => 'windows-1251', 'og_image' => ' http://img.gazeta.ru/files3/123/8192123/rumi +n-pic905-895x505-99564.jpg', 'description' => "\x{420}\x{443}\x{43c}\x{44b}\x{43d}\x{438} +\x{44f} \x{43d}\x{435} \x{441}\x{43c}\x{43e}\x{436}\x{435}\x{442} \x{ +43f}\x{43e}\x{43a}\x{430}\x{437}\x{44b}\x{432}\x{430}\x{442}\x{44c} \ +x{442}\x{435}\x{43b}\x{435}\x{432}\x{438}\x{437}\x{438}\x{43e}\x{43d} +\x{43d}\x{44b}\x{439} \x{43a}\x{43e}\x{43d}\x{43a}\x{443}\x{440}\x{44 +1} \x{ab}\x{415}\x{432}\x{440}\x{43e}\x{432}\x{438}\x{434}\x{435}\x{4 +3d}\x{438}\x{435}-2016\x{bb}, \x{43f}\x{435}\x{432}\x{435}\x{446} \x{ +41e}\x{432}\x{438}\x{434}\x{438}\x{443} \x{410}\x{43d}\x{442}\x{43e}\ +x{43d} \x{43d}\x{435} \x{432}\x{44b}\x{441}\x{442}\x{443}\x{43f}\x{43 +8}\x{442} \x{432} \x{421}\x{442}\x{43e}\x{43a}\x{433}\x{43e}\x{43b}\x +{44c}\x{43c}\x{435}, \x{430} \x{440}\x{443}\x{43c}\x{44b}\x{43d}\x{44 +1}\x{43a}\x{438}\x{435} \x{442}\x{435}\x{43b}\x{435}\x{437}\x{440}\x{ +438}\x{442}\x{435}\x{43b}\x{438} \x{43d}\x{435} \x{441}\x{43c}\x{43e} +\x{433}\x{443}\x{442} \x{43f}\x{440}\x{43e}\x{433}\x{43e}\x{43b}\x{43 +e}\x{441}\x{43e}\x{432}\x{430}\x{442}\x{44c} \x{437}\x{430} \x{43f}\x +{43e}\x{43d}\x{440}\x{430}\x{432}\x{438}\x{432}\x{448}\x{438}\x{445}\ +x{441}\x{44f} \x{43c}\x{443}\x{437}\x{44b}\x{43a}\x{430}\x{43d}\x{442 +}\x{43e}\x{432} \x{2014} \x{438}\x{437}-\x{437}\x{430} \x{434}\x{43e} +\x{43b}\x{433}\x{430} \x{432} 16 \x{43c}\x{43b}\x{43d} \x{448}\x{432} +\x{435}\x{439}\x{446}\x{430}\x{440}\x{441}\x{43a}\x{438}\x{445} \x{44 +4}\x{440}\x{430}\x{43d}\x{43a}\x{43e}\x{432}." };
...and here is the outputted JSON:

{"page_title":" Румыния не будет участвовать в «Евровидении-2016» из-за денег - Газета.Ru ","description":"Румыния не сможет показывать телевизионный конкурс «Евровидение-2016», певец Овидиу Антон не выступит в Стокгольме, а румынские телезрители не смогут проголосовать за понравившихся музыкантов — из-за долга в 16 млн швейцарских франков."}

..yet here is what comes back out:
$VAR1 = { 'images' => '', 'all_images' => '{"image_loop":["http://static.gazeta.ru/nm2 +012/i/quotes/finam_head.png","/nm2015//gzt/img/logo.png"," http://img +.gazeta.ru/files3/123/8192123/rumin-pic905-895x505-99564.jpg"," http: +//img.gazeta.ru/files3/885/8195885/igra-pic265-265x150-77294.jpg"," h +ttp://img.gazeta.ru/files3/725/7953725/RIAN_02710972.HR.ru-pic410-410 +x230-99945.jpg"," http://img.gazeta.ru/files3/331/8116331/2016-02-22T +104304Z_1519170817_D1AESOMBZKAD_RTRMADP_3_UKRAINE-TATARS-EUROVISION-p +ic410-410x230-5670.jpg","http://static.smi2.net/srcimg/2780020.png"," +/nm2015/gzt/img/logo_footer.png","http://static.gazeta.ru/nm2012/i/re +uters_a2.png","http://static.gazeta.ru/nm2012/i/prime_a2.png","http:/ +/static.gazeta.ru/nm2012/i/interfax_a2.png","http://static.gazeta.ru/ +nm2012/i/ria_a2.png","http://static.gazeta.ru/nm2012/i/it_a3.png","ht +tp://static.gazeta.ru/nm2012/i/lj_a2.png"," http://img.gazeta.ru/file +s3/123/8192123/rumin-pic905-895x505-99564.jpg"]}', 'url' => 'www.gazeta.ru/culture/2016/04/22/a_8191769.shtml', 'title' => ' Румыния Π½Π΅ Π±ΡƒΠ΄Π΅Ρ&#1 +30; ΡƒΡ‡Π°ΡΡ‚Π²ΠΎΠ²Π°Ρ‚ΡŒ Π² Β«Π•Π²Ρ&# +128;ΠΎΠ²ΠΈΠ΄Π΅Π½ΠΈΠΈ-2016Β» ΠΈΠ·-Π·Π° Π΄Π΅Π½Π΅Π³ - Π“Π°Π·Π΅Ρ&#13 +0;Π°.Ru ', 'description' => 'Румыния Π½Π΅ смоТСΡ& +#130; ΠΏΠΎΠΊΠ°Π·Ρ‹Π²Π°Ρ‚ΡŒ Ρ‚Π΅Π»Π΅Π²ΠΈΠ·ΠΈΠΎΠ½Π½ +Ρ‹ΠΉ конкурс Β«Π•Π²Ρ€ΠΎΠ²ΠΈΠ΄Π΅Π½ΠΈΠ΅ +-2016Β», ΠΏΠ΅Π²Π΅Ρ† ΠžΠ²ΠΈΠ΄ΠΈΡƒ Антон Π½Π΅ +выступит Π² Π‘Ρ‚ΠΎΠΊΠ³ΠΎΠ»ΡŒΠΌΠ +΅, Π° румынскиС Ρ‚Π΅Π»Π΅Π·Ρ€ΠΈΡ&#13 +0;Π΅Π»ΠΈ Π½Π΅ смогут проголосоваΡ&#13 +0;ΡŒ Π·Π° ΠΏΠΎΠ½Ρ€Π°Π²ΠΈΠ²ΡˆΠΈΡ…ΡΡ ΠΌΡƒΠ·Ρ +‹ΠΊΠ°Π½Ρ‚ΠΎΠ² β€” ΠΈΠ·-Π·Π° Π΄ΠΎΠ»Π³Π° Π² 16 ΠΌΠ» +Π½ ΡˆΠ²Π΅ΠΉΡ†Π°Ρ€ΡΠΊΠΈΡ… Ρ„Ρ€Π°Π½ΠΊΠΎΠ +².', 'grab_id' => '133' }; {"page_title":" Румыния Π½Π΅ Π±ΡƒΠ΄Π΅Ρ‚ Ρ&# +131;Ρ‡Π°ΡΡ‚Π²ΠΎΠ²Π°Ρ‚ΡŒ Π² Β«Π•Π²Ρ€ΠΎΠ +²ΠΈΠ΄Π΅Π½ΠΈΠΈ-2016Β» ΠΈΠ·-Π·Π° Π΄Π΅Π½Π΅Π³ - Π“Π°Π·Π΅Ρ‚Π°.Ru + ","description":"Румыния Π½Π΅ смоТСт ΠΏΠ +ΎΠΊΠ°Π·Ρ‹Π²Π°Ρ‚ΡŒ Ρ‚Π΅Π»Π΅Π²ΠΈΠ·ΠΈΠΎΠ½Π½Ρ‹ΠΉ + конкурс Β«Π•Π²Ρ€ΠΎΠ²ΠΈΠ΄Π΅Π½ΠΈΠ΅-2016Β», +ΠΏΠ΅Π²Π΅Ρ† ΠžΠ²ΠΈΠ΄ΠΈΡƒ Антон Π½Π΅ Π²Ρ‹ +ступит Π² Π‘Ρ‚ΠΎΠΊΠ³ΠΎΠ»ΡŒΠΌΠ΅, Π° Ρ&# +128;умынскиС Ρ‚Π΅Π»Π΅Π·Ρ€ΠΈΡ‚Π΅Π»ΠΈ +Π½Π΅ смогут ΠΏΡ€ΠΎΠ³ΠΎΠ»ΠΎΡΠΎΠ²Π°Ρ‚ΡŒ + Π·Π° ΠΏΠΎΠ½Ρ€Π°Π²ΠΈΠ²ΡˆΠΈΡ…ΡΡ ΠΌΡƒΠ·Ρ‹ΠΊΠ +°Π½Ρ‚ΠΎΠ² β€” ΠΈΠ·-Π·Π° Π΄ΠΎΠ»Π³Π° Π² 16 ΠΌΠ»Π½ Ρ&#136 +;вСйцарских Ρ„Ρ€Π°Π½ΠΊΠΎΠ².","cach +ed":1}


Cheers

Andy

In reply to Re^3: UTF8 issue when getting website via LWP::UserAgent in Perl by ultranerds
in thread UTF8 issue when getting website via LWP::UserAgent in Perl by ultranerds

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others about the Monastery: (5)
As of 2024-04-19 10:19 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found