in reply to Re^2: Print unicode strings to pdf
in thread Print unicode strings to pdf

You haven't told us what database you are talking to and what you are using to load data into the database. Also, we will need to know what the column type is you're inserting into.

When loading the data from Perl, you will need to use Encode to decode the data from its source representation to Unicode.

You will likely need to tell the database driver when loading data into the database and when reading data out of the database that it should consider the column as UTF-8 encoded Unicode (if that's what your original data is).

Alternatively, encode the data to UTF-8 encoded octets and then write those raw octets into the database. You will lose the ability to query the data from within the database in a nice way. LIKE queries and UPPER() will likely not work in the way you expect them.

When reading from the database, set up your database so it decodes the data from your database format to Unicode in Perl.

If you are using MySQL, read DBD::mysql for some UTF-8 options.

Replies are listed 'Best First'.
Re^4: Print unicode strings to pdf
by Anonymous Monk on Mar 26, 2018 at 14:24 UTC
    I am using MySQl and using the source command with a source file to insert into the table:
    insert_test (plaintext source file) INSERT INTO test VALUES(NULL, "\\x{20ab}"); INSERT INTO test VALUES(NULL, "\\x{005a}\\x{0024}"); INSERT INTO test VALUES(NULL, "\\x{0042}\\x{0073}"); mysql> source C:/Program Files/MySQL/MySQL Server 5.0/sql/insert_test
    Could you enlighten me on a string like this '\x{005a}'? Is it called a unicode string? And is that what we should store in the database (MySQL)?

      Hello again Anonymous Monk,

      Everything you are looking for is here A UTF8 round trip with MySQL.

      More specifically:

      use DBI(); my $dbh = DBI->connect ('dbi:mysql:test_db', $username, $password, {mysql_enable_utf8 => 1} );

      Let the MySQL do all the work for you. By doing this This step connects to the database, and tells DBD::mysql to auto-convert to/from UTF-8..

      Hope this helps, BR.

      Seeking for Perl wisdom...on the process of learning...not there...yet!
        Hey thanos1983! Thank you so much. Have never tried connecting with that additional setting mysql_enable_utf8. Will try it and post back later.
Re^4: Print unicode strings to pdf
by Anonymous Monk on Mar 26, 2018 at 19:03 UTC
    Could you enlighten me on this related issue?

    I have a json string from perl (encoded with to_json) which goes to my Javascript code. When I view the json string, it looks like this:
    { "hex_code":"\\x{a5}" }
    I want to display the hex_code as an actual symbol in an input field. But what I see in the input field is '\x{a5}' and not the symbol. In Javascript when I set a variable like this:
    var hex_code = '\x{a5}'; document.getElementById('some_field').value = hex_code;
    The symbol gets displayed in the input field.

    What do I need to in the perl code (or in the Javascript) to display the hex code as symbol in the input field?

    I'm quite confused by this. Hope you can shed some light :)

      What do you mean by "view the JSON string"? Do you mean you print it? Do you print it using Data::Dumper?

      If you want to turn the JSON back into a proper data structure, I suggest you use JSON or JSON::XS to do that. You might need to unescape the string before doing that if it contains doubled backslashes, so that it is proper JSON:

      use JSON 'decode_json'; use Data::Dumper; my $mangled_json = '{ "hex_code":"\\x{a5}" }'; print $mangled_json; my $json = $mangled_json; $json =~ s!\\\\!\\!g; print $json; my $structure = decode_json( $json ); $Data::Dumper::Useqq = 1; print Dumper $structure; binmode STDOUT, ':encoding(UTF-8)'; # well, hopefully, your terminal u +nderstands UTF-8 print $structure->{'hex_code'};
        Thanks Corion.

        Sorry, the view part is using the inbuilt "alert" function of Javascript. This is what I see of the json string from perl passed to Javascript using "alert":
        { "hex_code":"\\xa5" }
        When I display this in an html input field, it displays:
        \xa5
        After much googling with the right keywords, I found a solution by passing the value "\\xa5" (which originates from perl) to the function below:
        //https://stackoverflow.com/questions/4209104/decoding-hex-string-in-j +avascript String.prototype.decodeEscapeSequence = function() { return this.replace(/\\x([0-9A-Fa-f]{4})/g, function() { return String.fromCharCode(parseInt(arguments[1], 16)); }); };
        I think it's somewhat similar to your perl code below:
        sub unescape { my( $str ) = @_; $str =~ s!\\x\{([0-9a-f]{4})\}!chr(hex $1)!ge; $str };
        I am confused because when I have a variable set to a hex value in Javascript like this:
        var hex_code = '\xa5';
        That gets displayed correctly in the input field as a symbol.

        So I thought since "\\xa5" is what I see with Javascript's alert, if I remove the first backslash, it would work. It doesn't whether I remove the first backslash or not. I needed to unescape with the Javascript function found in stackoverflow.