Re: Unrecognized ICU conversion error
by choroba (Cardinal) on Aug 09, 2023 at 20:13 UTC
|
What module do you use to connect to Vertica? I've seen DBD::ODBC being used. What version of the module do you use for each perl version?
Note that if the old version is buggy and stores invalid characters, the fixed version might be unable to fetch them. Rewriting the problematic columns might be needed to fix the problems.
map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
| [reply] [d/l] |
Re: Unrecognized ICU conversion error
by ewcarroll (Initiate) on Aug 10, 2023 at 19:23 UTC
|
The Perl versions were inadvertently swapped in the original post, corrected info is as follows.
CURRENT HOST
CentOS Linux 7 (Core)
Perl version 5.16.3
Perl DBD::ODBC Version : 1.58
Vertica Analytic Database v9.2.1-28
vertica-client-8.1.1-0.x86_64
NEW HOST
Fedora Linux 38 (Thirty Eight)
Perl version 5.36.0
Perl DBD::ODBC Version : 1.61
Vertica Analytic Database v9.2.1-28
vertica-client-8.1.1-0.x86_64
Example of data causing the issue: SLAPŘ
OLD HOST LOG EXTRACT
[06/28/2023 13:03:23] loading ul_config ... <br>
[06/28/2023 13:03:26] user_level_l_topic.pl started: custom, 202306281
+30317, 6738, 42149 <br>
[06/28/2023 13:03:26] work_dir: /project/tmp/std_user_level/custom/202
+30628130317/6738/42149 <br>
[06/28/2023 13:03:26] 6738 xxxxx Weekly 31552 6582 42149 xxxxx Newsle
+tter 99175 custom N <br>
[06/28/2023 13:03:26] tactic_name='xxxxx_NR_381703.2' <br>
[06/28/2023 13:03:26] create_ul_target_list <br>
[06/28/2023 13:03:27] SELECT ANALYZE_STATISTICS('UL_TARGET_LIST') <br>
[06/28/2023 13:03:28] fill ul_cohort... <br>
[06/28/2023 13:06:49] 31342 records added <br>
[06/28/2023 13:06:49] starting fill_ul_report_detail... <br>
[06/28/2023 13:06:49] deleting ul_report_detail for REPORT_ID = 6738 a
+nd ID = 160497671... <br>
[06/28/2023 13:06:49] 0 records deleted <br>
[06/28/2023 13:06:49] inserting into ul_report_detail for 6738 and ID
+= 160497671... <br>
[06/28/2023 13:06:49] 1 records added <br>
[06/28/2023 13:06:49] USER <br>
[06/28/2023 13:06:52] ACTION <br>
[06/28/2023 13:06:54] Validating reports... <br>
[06/28/2023 13:06:54] USER: /project/tmp/std_user_level/custom/202
+30628130317/6738/42149/6738_xxxxx_USER_DATA_20230628130317.txt size=6
+083513 bytes <br>
[06/28/2023 13:06:54] ACTION: /project/tmp/std_user_level/custom/202
+30628130317/6738/42149/6738_xxxxx_USER_ACTION_DATA_20230628130317.txt
+ size=10097634 bytes <br>
[06/28/2023 13:06:54] file size validation - passed <br>
[06/28/2023 13:06:55] unique user counts in USER and ACTION - passed <
+br>
[06/28/2023 13:06:55] QUESTION is not applicable to this product <br>
[06/28/2023 13:06:55] 21050 records in /project/tmp/std_user_level/cus
+tom/20230628130317/6738/42149/6738_xxxxx_USER_DATA_20230628130317.txt
+ <br>
[06/28/2023 13:06:55] 31342 records in /project/tmp/std_user_level/cus
+tom/20230628130317/6738/42149/6738_xxxxx_USER_ACTION_DATA_20230628130
+317.txt <br>
[06/28/2023 13:06:55] update ul_run_status... <br>
[06/28/2023 13:06:55] user_level_l_topic.pl ended <br>
<br>
[06/28/2023 13:07:03] Connected to v_xxxxx_node0010 <br>
[06/28/2023 13:07:03] FILE_STATUS_ID=410508751 <br>
[06/28/2023 13:07:03] Load Format Data... <br>
[06/28/2023 13:07:03] Extract report data... <br>
[06/28/2023 13:07:07] Generate data <br>
<br>
[06/28/2023 13:07:07] generate_data: Processing format detail 1 <br>
[06/28/2023 13:07:07] Metrics=4 <br>
[06/28/2023 13:07:08] generate_data: done with format detail 1:User Ac
+tion Media Data <br>
<br>
[06/28/2023 13:07:08] generate_data: Processing format detail 2 <br>
[06/28/2023 13:07:08] Metrics=40 <br>
[06/28/2023 13:07:22] generate_data: done with format detail 2:User Ac
+tion Data <br>
<br>
[06/28/2023 13:07:22] Generate files <br>
<br>
[06/28/2023 13:07:22] generate_file: Processing format detail 1 <br>
[06/28/2023 13:07:22] generate_file: done with format detail 1:User Ac
+tion Media Data <br>
<br>
[06/28/2023 13:07:22] generate_file: Processing format detail 2 <br>
[06/28/2023 13:07:24] New file: /mnt/xxxxx/PromoUserLevelReporting/xxx
+xx/xxxxx/custom/xxxxx/6738_xxxxx_USER_LEVEL_20230628130701.txt <br>
[06/28/2023 13:07:27] New file: /mnt/xxxxx/PromoUserLevelReporting/xxx
+xx/xxxxx/custom/xxxxx/6738_xxxxx_CTL_20230628130701.ctl <br>
[06/28/2023 13:07:27] generate_file: done with format detail 2:User Ac
+tion Data <br>
<br>
[06/28/2023 13:07:27] Moving 2 report files to target dir <br>
[06/28/2023 13:07:27] mv /project/tmp/generate_report_files/2023062813
+0701/6738/104/* '/mnt/xxxxx/PromoUserLevelReporting/xxxxx/xxxxx/custo
+m/xxxxx' 2>>/dev/null <br>
[06/28/2023 13:07:27] generate_report_files.pl ended <br>
NEW HOST LOG EXTRACT
[06/28/2023 12:56:45] loading ul_config ... <br>
[06/28/2023 12:56:45] user_level_l_topic.pl started: custom, 202306281
+25643, 6738, 42149 <br>
[06/28/2023 12:56:45] work_dir: /project/tmp/std_user_level/custom/202
+30628125643/6738/42149 <br>
[06/28/2023 12:56:45] 6738 xxxxx Weekly 31552 6582 42149 xxxxx Newsle
+tter 99175 custom N <br>
[06/28/2023 12:56:45] tactic_name='xxxxx_NR_381703.2' <br>
[06/28/2023 12:56:45] create_ul_target_list <br>
[06/28/2023 12:56:45] SELECT ANALYZE_STATISTICS('UL_TARGET_LIST') <br>
[06/28/2023 12:56:46] fill ul_cohort... <br>
[06/28/2023 12:59:43] 31342 records added <br>
[06/28/2023 12:59:43] starting fill_ul_report_detail... <br>
[06/28/2023 12:59:43] deleting ul_report_detail for REPORT_ID = 6738 a
+nd ID = 160493471... <br>
[06/28/2023 12:59:43] 0 records deleted <br>
[06/28/2023 12:59:43] inserting into ul_report_detail for 6738 and ID
+= 160493471... <br>
[06/28/2023 12:59:43] 1 records added <br>
[06/28/2023 12:59:43] USER <br>
Wide character in print at UL_VERTICA.pm line 951. <br>
Wide character in print at UL_VERTICA.pm line 951. <br>
Wide character in print at UL_VERTICA.pm line 951. <br>
[06/28/2023 12:59:46] ACTION <br>
[06/28/2023 12:59:49] Validating reports... <br>
[06/28/2023 12:59:49] USER: /project/tmp/std_user_level/custom/202
+30628125643/6738/42149/6738_xxxxx_USER_DATA_20230628125643.txt size=6
+083486 bytes <br>
[06/28/2023 12:59:49] ACTION: /project/tmp/std_user_level/custom/202
+30628125643/6738/42149/6738_xxxxx_USER_ACTION_DATA_20230628125643.txt
+ size=9990561 bytes <br>
[06/28/2023 12:59:49] file size validation - passed <br>
[06/28/2023 12:59:50] unique user counts in USER and ACTION - passed <
+br>
[06/28/2023 12:59:50] QUESTION is not applicable to this product <br>
[06/28/2023 12:59:50] 21050 records in /project/tmp/std_user_level/cus
+tom/20230628125643/6738/42149/6738_xxxxx_USER_DATA_20230628125643.txt
+ <br>
[06/28/2023 12:59:50] 31342 records in /project/tmp/std_user_level/cus
+tom/20230628125643/6738/42149/6738_xxxxx_USER_ACTION_DATA_20230628125
+643.txt <br>
[06/28/2023 12:59:50] update ul_run_status... <br>
[06/28/2023 12:59:50] user_level_l_topic.pl ended <br>
<br>
[06/28/2023 13:00:00] Connected to v_xxxxx_node0005 <br>
[06/28/2023 13:00:00] FILE_STATUS_ID=410504550 <br>
[06/28/2023 13:00:00] Load Format Data... <br>
[06/28/2023 13:00:00] Extract report data... <br>
[06/28/2023 13:00:07] Generate data <br>
<br>
[06/28/2023 13:00:07] generate_data: Processing format detail 1 <br>
[06/28/2023 13:00:07] Metrics=4 <br>
[06/28/2023 13:00:12] generate_data: done with format detail 1:User Ac
+tion Media Data <br>
<br>
[06/28/2023 13:00:12] generate_data: Processing format detail 2 <br>
[06/28/2023 13:00:12] Metrics=40 <br>
[06/28/2023 13:00:36] generate_data: done with format detail 2:User Ac
+tion Data <br>
<br>
[06/28/2023 13:00:36] Generate files <br>
<br>
[06/28/2023 13:00:36] generate_file: Processing format detail 1 <br>
[06/28/2023 13:00:37] generate_file: done with format detail 1:User Ac
+tion Media Data <br>
<br>
[06/28/2023 13:00:37] generate_file: Processing format detail 2 <br>
[06/28/2023 13:00:38] Error: [Vertica][Support] (50310) Unrecognized I
+CU conversion error. (SQL-HY000) <br>
[06/28/2023 13:00:38] generate_report_files.pl ended <br>
| [reply] [d/l] [select] |
|
|
G'day ewcarroll,
Welcome to the Monastery.
++ for your post but did you notice that all of your timestamps have become links?
Links are autogenerated for any plain text in square brackets.
It's better to wrap code, data, exception messages, and other program output in <code>...</code> tags.
This will not create links and also handles characters that are special to HTML (e.g. &, <, and so on).
See "Writeup Formatting Tips" for more details about this.
| [reply] [d/l] [select] |
|
|
Wide character in print at UL_VERTICA.pm line 951.
As i wrote in Re^5: Unrecognized ICU conversion error, this looks like a Unicode/UTF8 Problem.
Basically, Perl internally uses Unicode codepoints for characters, e.g. the "number" of a character can be greater than 255. Example:
#!/usr/bin/env perl
use strict;
use warnings;
use utf8;
use Encode;
# Let's use the "Medium shade" block, Unicode point 0x2592
# https://www.unicode.org/charts/beta/nameslist/n_2580.html
my $unicodechar = "\N{MEDIUM SHADE}";
print "Character code: ", ord($unicodechar), "\n";
print "Character: ", $unicodechar, "\n"; # "Wide character in print at
+ unicode_perlmonks.pl line 15."
my $utf8 = encode('UTF-8', $unicodechar, Encode::FB_CROAK);
print "Character as UTF8: ", $utf8, "\n";
In line 15, when you try to print the internal representation, problems happen. Basically, STDOUT expects valid 8-bit-per-byte characters, but you try to output too many bits for a single byte.
With proper encoding, in this case UTF8, you can turn the single character into a bytestream that encodes the character into multiple valid bytes. This isn't just splitting up the internal bytes, it is a "proper" encoding that works around multiple issues. Like, for example, preventing bytes that have the value of zero (so as not to mess up zero terminated string handling in C-like languages).
Tom Scott has a nice video on this if you are interested how this actually works: Characters, Symbols and the Unicode Miracle - Computerphile
| [reply] [d/l] |
|
|
It's fine to update your post; however, it's important to indicate that you've done so
— especially when your update invalidates an existing response.
See "How do I change/delete my post?" for more about that.
I also note that all lines of your log extracts end with " <br>".
I suspect this doesn't reflect the original and were probably added initially
to format the log data for paragraph text.
I am aware that this was your first post here.
My comments are intended to be informational; not any kind of rebuke. :-)
| [reply] [d/l] |
|
|
'Unrecognized ICU conversion error' disappeared when the vertica client was upgraded to 23.3.0; However the 'Wide character in print at XXXX line XX' is appearing at line 137 in addition to line 951.
This indeed looks like Unicode/UTF8 issue, appreciate any help providing solution.
| [reply] |
|
|
|
|
|
|
|
Re: Unrecognized ICU conversion error
by Anonymous Monk on Aug 09, 2023 at 19:54 UTC
|
| [reply] |
|
|
| [reply] |
|
|
| [reply] |
|
|
|
|
|
|
|
|
| [reply] |