Re: UTF-8 and PSGI/Starman vs. CGI

Probably your setup is broken all the way through and it only acts correctly in the CGI/Apache level because it's broken in the same ways in the same places; e.g., maybe you're not reading or writing UTF-8 in your DB but stuffing the bytes into Latin-1 or something and they are read the same way they are written so the breakage is transparent and seems correct.

Every step has to be right for UTF-8 to be robust/correct. That means the HTTP headers, the webserver's output of the code's output, the forms and the decoding of their input, the DB, the code that reads and writes the DB all have to be properly declared/configured and encode and decode in agreement.

I would start with the DB because that's usually the root of the problem in my experience. Google for "check table charset {DB type}" or something like that to make sure the tables are using UTF-8. Then work backwards. DBI+yourDBD calls with proper UTF-8 setup. Then decoding form input and matching output level encoding in your code. Then app server (FCGI/nginx). Then webserver. Making errors fatal at all levels is helpful too.

Tangent: uWSGI is a better choice than Starman.

Comment on Re: UTF-8 and PSGI/Starman vs. CGI

Replies are listed 'Best First'.
Re^2: UTF-8 and PSGI/Starman vs. CGI by stevieb (Canon) on Mar 21, 2018 at 16:49 UTC
Tangent: uWSGI is a better choice than Starman. May I request your reasoning/opinion on this? Curious as I use Starman for one of my larger projects, and haven't looked at any other options since day one as it just worked.	[reply]
Re^3: UTF-8 and PSGI/Starman vs. CGI by Your Mother (Archbishop) on Mar 21, 2018 at 17:14 UTC
The controls and options are deeper and it is much more robust. Starman starts dropping requests and such under load. I suspect my problems were largely an edge case caused by legacy code and EOL'd Linux but I had nothing but straight up segfaults and mysterious socket failures pointing to ancient unconfirmed tickets trying to get Starman working at work. Here is one of many benchmarks out there. I really wanted to like Starman better. I'm gung-ho for Perl even when it's not the best option but in this case, for me at least, there was nothing at all to recommend the Perl side.	[reply]
Re^4: UTF-8 and PSGI/Starman vs. CGI by karlgoethebier (Abbot) on Mar 22, 2018 at 11:39 UTC
"...Starman starts dropping requests and such under load..." May be, i don't know but i can imagine this. I didn't eat the wisdom with spoons but for this reason(s) it might be probably a good idea to set up nginx as a reverse proxy. It should handle the requests much better than Starman. It's a common setup. Just some thoughts. Best regards, Karl ŤThe Crux of the Biscuit is the Apostropheť `perl -MCrypt::CBC -E 'say Crypt::CBC->new(-key=>'kgb',-cipher=>"Blowfish")->decrypt_hex($ENV{KARL});'`Help	[reply] [d/l]
Re^5: UTF-8 and PSGI/Starman vs. CGI by dsheroh (Monsignor) on Mar 22, 2018 at 12:34 UTC
Re^4: UTF-8 and PSGI/Starman vs. CGI by stevieb (Canon) on Mar 21, 2018 at 17:33 UTC
Thank you for the feedback, Your Mother!	[reply]
Re^2: UTF-8 and PSGI/Starman vs. CGI by dsheroh (Monsignor) on Mar 22, 2018 at 08:06 UTC
Probably your setup is broken all the way through and it only acts correctly in the CGI/Apache level because it's broken in the same ways in the same places Ugh. That's a possibility I really don't want to think about, because the webapp in question isn't a small, in-house project, it's a very large open source project (Koha, to be specific) and we really don't have the time or manpower to do a thorough audit of how it handles character encodings. Still, double-checking the database settings and taking a look at uWSGI are low-hanging fruit which can easily fit into the schedule, so I'll at least cross my fingers and try those before doing anything drastic. Thanks!	[reply]
Re^3: UTF-8 and PSGI/Starman vs. CGI by Your Mother (Archbishop) on Mar 22, 2018 at 14:42 UTC
Yeah. No fun at all if so. It was so for us and took a lot of work to fix. I don't know if this is the right place, but in case you haven't seen it -> Charsets/Encoding in Koha. I still recommend uWSGI but I don't think it will help with encoding problems, just performance and stability.	[reply]
Re^4: UTF-8 and PSGI/Starman vs. CGI by dsheroh (Monsignor) on Mar 23, 2018 at 11:07 UTC
I have seen that page before, but it can't hurt to go through and double-check all of those settings for correctness. We already discovered that the db had a default latin1 encoding and are in the process of rebuilding it as utf8mb4, but there might be something else there that's also been missed.	[reply]