Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Re: UTF-8 and PSGI/Starman vs. CGI

by Your Mother (Archbishop)
on Mar 21, 2018 at 16:39 UTC ( [id://1211439]=note: print w/replies, xml ) Need Help??


in reply to UTF-8 and PSGI/Starman vs. CGI

Probably your setup is broken all the way through and it only acts correctly in the CGI/Apache level because it's broken in the same ways in the same places; e.g., maybe you're not reading or writing UTF-8 in your DB but stuffing the bytes into Latin-1 or something and they are read the same way they are written so the breakage is transparent and seems correct.

Every step has to be right for UTF-8 to be robust/correct. That means the HTTP headers, the webserver's output of the code's output, the forms and the decoding of their input, the DB, the code that reads and writes the DB all have to be properly declared/configured and encode and decode in agreement.

I would start with the DB because that's usually the root of the problem in my experience. Google for "check table charset {DB type}" or something like that to make sure the tables are using UTF-8. Then work backwards. DBI+yourDBD calls with proper UTF-8 setup. Then decoding form input and matching output level encoding in your code. Then app server (FCGI/nginx). Then webserver. Making errors fatal at all levels is helpful too.

Tangent: uWSGI is a better choice than Starman.

Replies are listed 'Best First'.
Re^2: UTF-8 and PSGI/Starman vs. CGI
by stevieb (Canon) on Mar 21, 2018 at 16:49 UTC
    Tangent: uWSGI is a better choice than Starman.

    May I request your reasoning/opinion on this? Curious as I use Starman for one of my larger projects, and haven't looked at any other options since day one as it just worked.

      The controls and options are deeper and it is much more robust. Starman starts dropping requests and such under load. I suspect my problems were largely an edge case caused by legacy code and EOL'd Linux but I had nothing but straight up segfaults and mysterious socket failures pointing to ancient unconfirmed tickets trying to get Starman working at work. Here is one of many benchmarks out there. I really wanted to like Starman better. I'm gung-ho for Perl even when it's not the best option but in this case, for me at least, there was nothing at all to recommend the Perl side.

        "...Starman starts dropping requests and such under load..."

        May be, i don't know but i can imagine this. I didn't eat the wisdom with spoons but for this reason(s) it might be probably a good idea to set up nginx as a reverse proxy. It should handle the requests much better than Starman. It's a common setup.

        Just some thoughts.

        Best regards, Karl

        «The Crux of the Biscuit is the Apostrophe»

        perl -MCrypt::CBC -E 'say Crypt::CBC->new(-key=>'kgb',-cipher=>"Blowfish")->decrypt_hex($ENV{KARL});'Help

Re^2: UTF-8 and PSGI/Starman vs. CGI
by dsheroh (Monsignor) on Mar 22, 2018 at 08:06 UTC
    Probably your setup is broken all the way through and it only acts correctly in the CGI/Apache level because it's broken in the same ways in the same places
    Ugh. That's a possibility I really don't want to think about, because the webapp in question isn't a small, in-house project, it's a very large open source project (Koha, to be specific) and we really don't have the time or manpower to do a thorough audit of how it handles character encodings.

    Still, double-checking the database settings and taking a look at uWSGI are low-hanging fruit which can easily fit into the schedule, so I'll at least cross my fingers and try those before doing anything drastic. Thanks!

      Yeah. No fun at all if so. It was so for us and took a lot of work to fix. I don't know if this is the right place, but in case you haven't seen it -> Charsets/Encoding in Koha. I still recommend uWSGI but I don't think it will help with encoding problems, just performance and stability.

        I have seen that page before, but it can't hurt to go through and double-check all of those settings for correctness. We already discovered that the db had a default latin1 encoding and are in the process of rebuilding it as utf8mb4, but there might be something else there that's also been missed.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1211439]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (8)
As of 2024-04-23 09:57 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found