in reply to Re^2: Perl encoding problem
in thread Perl encoding problem

Thank you very much for your additional comments.

The scripts have not changed but the Perl version and the locale of the server. These are the variables I thought to be the reason for my problems.

The migration to a mySQL UTF8 DB is one big target but there is a bunch of things I will have to modify to achieve this.
The migration of the server is one first step.
The diffuculties I am having now help me to prepare the next steps - hopefully.

I have a couple of files with different encodings that all end up in the database
which is at the moment latin1_swedish_ci and will be some utf8 sometimes.
All migrated scripts and files worked fine so far and the first problem occured with the regex with the foreign characters in this script.

I now added:
foreach my $key (keys %$add) { $add->{$key} = encode('iso-8859-1', $add->{$key}, 1); }

which is what I would have expected to be done originally.
The outcome of the encoding is that all lines with foreign characters do not appear in the database anymore.
I get "Incorrect string value: '\xE4lter'" for example.
Without the encoding all lines are added in the database and all of them look alright.

So as far as I understand you my first assumption of the encoding of the strings in $add is wrong and I should not just insert them into a latin1_swedish_ci DB.
On the other hand encoding them produces errors during the import into a database with latin1_swedish_ci collation.
So I am having another problem I donīt yet understand yet.

Replies are listed 'Best First'.
Re^4: Perl encoding problem
by NERDVANA (Priest) on Dec 15, 2021 at 14:17 UTC
    Can you find out whether that error is coming from the mysql server or from the DBI driver? If you set environment variable DBI_TRACE=1 it should clarify whether the query was sent to the server and rejected, or if it failed before sending.

    If it failed before sending, then what I think is most likely is that the DBI driver and/or mysql client library (which is presumably a new version as part of your new perl version) has gotten "smarter" and is trying to do the encode for you, and expects that you provide it a logically "unicode string" for it to encode. And if this is really the case, that is good news because you don't have to manually encode things, and (probably) will continue to work without further code changes when you switch to a utf8 database.