in reply to Unrecognized ICU conversion error

The Perl versions were inadvertently swapped in the original post, corrected info is as follows.

CURRENT HOST
CentOS Linux 7 (Core)
Perl version 5.16.3
Perl DBD::ODBC Version : 1.58
Vertica Analytic Database v9.2.1-28
vertica-client-8.1.1-0.x86_64

NEW HOST
Fedora Linux 38 (Thirty Eight)
Perl version 5.36.0
Perl DBD::ODBC Version : 1.61
Vertica Analytic Database v9.2.1-28
vertica-client-8.1.1-0.x86_64

Example of data causing the issue: SLAPŘ


OLD HOST LOG EXTRACT
[06/28/2023 13:03:23] loading ul_config ... <br> [06/28/2023 13:03:26] user_level_l_topic.pl started: custom, 202306281 +30317, 6738, 42149 <br> [06/28/2023 13:03:26] work_dir: /project/tmp/std_user_level/custom/202 +30628130317/6738/42149 <br> [06/28/2023 13:03:26] 6738 xxxxx Weekly 31552 6582 42149 xxxxx Newsle +tter 99175 custom N <br> [06/28/2023 13:03:26] tactic_name='xxxxx_NR_381703.2' <br> [06/28/2023 13:03:26] create_ul_target_list <br> [06/28/2023 13:03:27] SELECT ANALYZE_STATISTICS('UL_TARGET_LIST') <br> [06/28/2023 13:03:28] fill ul_cohort... <br> [06/28/2023 13:06:49] 31342 records added <br> [06/28/2023 13:06:49] starting fill_ul_report_detail... <br> [06/28/2023 13:06:49] deleting ul_report_detail for REPORT_ID = 6738 a +nd ID = 160497671... <br> [06/28/2023 13:06:49] 0 records deleted <br> [06/28/2023 13:06:49] inserting into ul_report_detail for 6738 and ID += 160497671... <br> [06/28/2023 13:06:49] 1 records added <br> [06/28/2023 13:06:49] USER <br> [06/28/2023 13:06:52] ACTION <br> [06/28/2023 13:06:54] Validating reports... <br> [06/28/2023 13:06:54] USER: /project/tmp/std_user_level/custom/202 +30628130317/6738/42149/6738_xxxxx_USER_DATA_20230628130317.txt size=6 +083513 bytes <br> [06/28/2023 13:06:54] ACTION: /project/tmp/std_user_level/custom/202 +30628130317/6738/42149/6738_xxxxx_USER_ACTION_DATA_20230628130317.txt + size=10097634 bytes <br> [06/28/2023 13:06:54] file size validation - passed <br> [06/28/2023 13:06:55] unique user counts in USER and ACTION - passed < +br> [06/28/2023 13:06:55] QUESTION is not applicable to this product <br> [06/28/2023 13:06:55] 21050 records in /project/tmp/std_user_level/cus +tom/20230628130317/6738/42149/6738_xxxxx_USER_DATA_20230628130317.txt + <br> [06/28/2023 13:06:55] 31342 records in /project/tmp/std_user_level/cus +tom/20230628130317/6738/42149/6738_xxxxx_USER_ACTION_DATA_20230628130 +317.txt <br> [06/28/2023 13:06:55] update ul_run_status... <br> [06/28/2023 13:06:55] user_level_l_topic.pl ended <br> <br> [06/28/2023 13:07:03] Connected to v_xxxxx_node0010 <br> [06/28/2023 13:07:03] FILE_STATUS_ID=410508751 <br> [06/28/2023 13:07:03] Load Format Data... <br> [06/28/2023 13:07:03] Extract report data... <br> [06/28/2023 13:07:07] Generate data <br> <br> [06/28/2023 13:07:07] generate_data: Processing format detail 1 <br> [06/28/2023 13:07:07] Metrics=4 <br> [06/28/2023 13:07:08] generate_data: done with format detail 1:User Ac +tion Media Data <br> <br> [06/28/2023 13:07:08] generate_data: Processing format detail 2 <br> [06/28/2023 13:07:08] Metrics=40 <br> [06/28/2023 13:07:22] generate_data: done with format detail 2:User Ac +tion Data <br> <br> [06/28/2023 13:07:22] Generate files <br> <br> [06/28/2023 13:07:22] generate_file: Processing format detail 1 <br> [06/28/2023 13:07:22] generate_file: done with format detail 1:User Ac +tion Media Data <br> <br> [06/28/2023 13:07:22] generate_file: Processing format detail 2 <br> [06/28/2023 13:07:24] New file: /mnt/xxxxx/PromoUserLevelReporting/xxx +xx/xxxxx/custom/xxxxx/6738_xxxxx_USER_LEVEL_20230628130701.txt <br> [06/28/2023 13:07:27] New file: /mnt/xxxxx/PromoUserLevelReporting/xxx +xx/xxxxx/custom/xxxxx/6738_xxxxx_CTL_20230628130701.ctl <br> [06/28/2023 13:07:27] generate_file: done with format detail 2:User Ac +tion Data <br> <br> [06/28/2023 13:07:27] Moving 2 report files to target dir <br> [06/28/2023 13:07:27] mv /project/tmp/generate_report_files/2023062813 +0701/6738/104/* '/mnt/xxxxx/PromoUserLevelReporting/xxxxx/xxxxx/custo +m/xxxxx' 2>>/dev/null <br> [06/28/2023 13:07:27] generate_report_files.pl ended <br>



NEW HOST LOG EXTRACT
[06/28/2023 12:56:45] loading ul_config ... <br> [06/28/2023 12:56:45] user_level_l_topic.pl started: custom, 202306281 +25643, 6738, 42149 <br> [06/28/2023 12:56:45] work_dir: /project/tmp/std_user_level/custom/202 +30628125643/6738/42149 <br> [06/28/2023 12:56:45] 6738 xxxxx Weekly 31552 6582 42149 xxxxx Newsle +tter 99175 custom N <br> [06/28/2023 12:56:45] tactic_name='xxxxx_NR_381703.2' <br> [06/28/2023 12:56:45] create_ul_target_list <br> [06/28/2023 12:56:45] SELECT ANALYZE_STATISTICS('UL_TARGET_LIST') <br> [06/28/2023 12:56:46] fill ul_cohort... <br> [06/28/2023 12:59:43] 31342 records added <br> [06/28/2023 12:59:43] starting fill_ul_report_detail... <br> [06/28/2023 12:59:43] deleting ul_report_detail for REPORT_ID = 6738 a +nd ID = 160493471... <br> [06/28/2023 12:59:43] 0 records deleted <br> [06/28/2023 12:59:43] inserting into ul_report_detail for 6738 and ID += 160493471... <br> [06/28/2023 12:59:43] 1 records added <br> [06/28/2023 12:59:43] USER <br> Wide character in print at UL_VERTICA.pm line 951. <br> Wide character in print at UL_VERTICA.pm line 951. <br> Wide character in print at UL_VERTICA.pm line 951. <br> [06/28/2023 12:59:46] ACTION <br> [06/28/2023 12:59:49] Validating reports... <br> [06/28/2023 12:59:49] USER: /project/tmp/std_user_level/custom/202 +30628125643/6738/42149/6738_xxxxx_USER_DATA_20230628125643.txt size=6 +083486 bytes <br> [06/28/2023 12:59:49] ACTION: /project/tmp/std_user_level/custom/202 +30628125643/6738/42149/6738_xxxxx_USER_ACTION_DATA_20230628125643.txt + size=9990561 bytes <br> [06/28/2023 12:59:49] file size validation - passed <br> [06/28/2023 12:59:50] unique user counts in USER and ACTION - passed < +br> [06/28/2023 12:59:50] QUESTION is not applicable to this product <br> [06/28/2023 12:59:50] 21050 records in /project/tmp/std_user_level/cus +tom/20230628125643/6738/42149/6738_xxxxx_USER_DATA_20230628125643.txt + <br> [06/28/2023 12:59:50] 31342 records in /project/tmp/std_user_level/cus +tom/20230628125643/6738/42149/6738_xxxxx_USER_ACTION_DATA_20230628125 +643.txt <br> [06/28/2023 12:59:50] update ul_run_status... <br> [06/28/2023 12:59:50] user_level_l_topic.pl ended <br> <br> [06/28/2023 13:00:00] Connected to v_xxxxx_node0005 <br> [06/28/2023 13:00:00] FILE_STATUS_ID=410504550 <br> [06/28/2023 13:00:00] Load Format Data... <br> [06/28/2023 13:00:00] Extract report data... <br> [06/28/2023 13:00:07] Generate data <br> <br> [06/28/2023 13:00:07] generate_data: Processing format detail 1 <br> [06/28/2023 13:00:07] Metrics=4 <br> [06/28/2023 13:00:12] generate_data: done with format detail 1:User Ac +tion Media Data <br> <br> [06/28/2023 13:00:12] generate_data: Processing format detail 2 <br> [06/28/2023 13:00:12] Metrics=40 <br> [06/28/2023 13:00:36] generate_data: done with format detail 2:User Ac +tion Data <br> <br> [06/28/2023 13:00:36] Generate files <br> <br> [06/28/2023 13:00:36] generate_file: Processing format detail 1 <br> [06/28/2023 13:00:37] generate_file: done with format detail 1:User Ac +tion Media Data <br> <br> [06/28/2023 13:00:37] generate_file: Processing format detail 2 <br> [06/28/2023 13:00:38] Error: [Vertica][Support] (50310) Unrecognized I +CU conversion error. (SQL-HY000) <br> [06/28/2023 13:00:38] generate_report_files.pl ended <br>

Replies are listed 'Best First'.
Re^2: Unrecognized ICU conversion error
by cavac (Prior) on Aug 11, 2023 at 11:31 UTC

    Wide character in print at UL_VERTICA.pm line 951.

    As i wrote in Re^5: Unrecognized ICU conversion error, this looks like a Unicode/UTF8 Problem.

    Basically, Perl internally uses Unicode codepoints for characters, e.g. the "number" of a character can be greater than 255. Example:

    #!/usr/bin/env perl use strict; use warnings; use utf8; use Encode; # Let's use the "Medium shade" block, Unicode point 0x2592 # https://www.unicode.org/charts/beta/nameslist/n_2580.html my $unicodechar = "\N{MEDIUM SHADE}"; print "Character code: ", ord($unicodechar), "\n"; print "Character: ", $unicodechar, "\n"; # "Wide character in print at + unicode_perlmonks.pl line 15." my $utf8 = encode('UTF-8', $unicodechar, Encode::FB_CROAK); print "Character as UTF8: ", $utf8, "\n";

    In line 15, when you try to print the internal representation, problems happen. Basically, STDOUT expects valid 8-bit-per-byte characters, but you try to output too many bits for a single byte.

    With proper encoding, in this case UTF8, you can turn the single character into a bytestream that encodes the character into multiple valid bytes. This isn't just splitting up the internal bytes, it is a "proper" encoding that works around multiple issues. Like, for example, preventing bytes that have the value of zero (so as not to mess up zero terminated string handling in C-like languages).

    Tom Scott has a nice video on this if you are interested how this actually works: Characters, Symbols and the Unicode Miracle - Computerphile

    PerlMonks XP is useless? Not anymore: XPD - Do more with your PerlMonks XP
Re^2: Unrecognized ICU conversion error
by kcott (Archbishop) on Aug 11, 2023 at 13:36 UTC

    G'day ewcarroll,

    Welcome to the Monastery.

    ++ for your post but did you notice that all of your timestamps have become links?

    Links are autogenerated for any plain text in square brackets. It's better to wrap code, data, exception messages, and other program output in <code>...</code> tags. This will not create links and also handles characters that are special to HTML (e.g. &, <, and so on). See "Writeup Formatting Tips" for more details about this.

    — Ken

Re^2: Unrecognized ICU conversion error [after your updates]
by kcott (Archbishop) on Aug 11, 2023 at 19:06 UTC

    It's fine to update your post; however, it's important to indicate that you've done so — especially when your update invalidates an existing response. See "How do I change/delete my post?" for more about that.

    I also note that all lines of your log extracts end with " <br>". I suspect this doesn't reflect the original and were probably added initially to format the log data for paragraph text.

    I am aware that this was your first post here. My comments are intended to be informational; not any kind of rebuke. :-)

    — Ken

      'Unrecognized ICU conversion error' disappeared when the vertica client was upgraded to 23.3.0; However the 'Wide character in print at XXXX line XX' is appearing at line 137 in addition to line 951. This indeed looks like Unicode/UTF8 issue, appreciate any help providing solution.

        [Identity: I am getting a little confused regarding with whom I'm conversing: ewcarroll or Sainathuni. If these are two separate people, perhaps working on the same project, please advise. If these are two usernames registered by the same person, please read "Site Rules Governing User Accounts" and action appropriately. Thankyou.]

        "This indeed looks like Unicode/UTF8 issue, appreciate any help providing solution."

        My two previous posts in this thread were purely to help a new user. I have no knowledge of Vertica Analytic Database, vertica-client or UL_VERTICA.pm; furthermore, I have no access to CentOS or Fedora Linux systems. I can provide the following, very general, potential solution (on a Cygwin system which was fully updated yesterday).

        $ uname -a CYGWIN_NT-10.0-19045 titan 3.4.7-1.x86_64 2023-06-16 14:04 UTC x86_64 +Cygwin

        I created the module Dummy_Vertica intended to emulate what I think might be in UL_VERTICA:

        $ cat /home/ken/tmp/pm_11153782_unicode/Dummy_Vertica.pm package Dummy_Vertica; use v5.36; use utf8; sub test_wide_character_print { say 'SLAP example:'; print "SLAPŘ\n"; say 'IMP example:'; print "\N{IMP}\n"; return; } 1;

        [IMP (U+01F47F) was chosen completely arbitrarily as a wide character. See the Unicode® PDF Code Chart "Miscellaneous Symbols and Pictographs" for further details.]

        I wrote the following script, test_1.pl, to emulate the type of thing you're seeing:

        $ cat /home/ken/tmp/pm_11153782_unicode/test_1.pl #!/usr/bin/env perl use v5.36; BEGIN { say "Perl version: $^V"; } use lib '/home/ken/tmp/pm_11153782_unicode'; use Dummy_Vertica; Dummy_Vertica::test_wide_character_print();

        Output:

        ken@titan ~/tmp/pm_11153782_unicode
        $ ./test_1.pl
        Perl version: v5.36.0
        SLAP example:
        SLAP▒
        IMP example:
        Wide character in print at /home/ken/tmp/pm_11153782_unicode/Dummy_Vertica.pm line 10.
        👿
        

        As you can see, there's two types of problems: instead of Ř; and, the "Wide character ..." message. Both can be resolved by using the open pragma. Here's test_2.pl which is identical to test_1.pl except for the addition of one "use open ..." statement:

        $ cat /home/ken/tmp/pm_11153782_unicode/test_2.pl #!/usr/bin/env perl use v5.36; use open OUT => qw{:encoding(UTF-8) :std}; BEGIN { say "Perl version: $^V"; } use lib '/home/ken/tmp/pm_11153782_unicode'; use Dummy_Vertica; Dummy_Vertica::test_wide_character_print();

        Output:

        ken@titan ~/tmp/pm_11153782_unicode
        $ ./test_2.pl
        Perl version: v5.36.0
        SLAP example:
        SLAPŘ
        IMP example:
        👿
        

        — Ken

        As there are many scripts, we are looking for a solution at configuration/high level than in the script itself. Thank you!