I have a search function written by me to search a MYSQL database from a website form which calls a php script

website is LAMP Centos 7 with WHM

the database is MYSQL 5.7.36 fields are utf8_general_ci

Database Collation latin1_swedish_ci

Database Connection Collation utf8mb4_general_ci

perl 5 (revision 5 version 16 subversion 3)

the database connection is done like this:

sub connect { my $dsn = "DBI:mysql:host=$host_name"; my %conn_attrs = (PrintError => 0, RaiseError => 1, AutoCommit => 1); $dsn .= ";database=$db_name" if defined $db_name; $dsn .= ";mysql_socket=$socket_file" if defined $socket_file; $dsn .= ";port=$port_num" if defined $port_num; return (DBI->connect ($dsn, $user_name, $password, \%conn_attrs )); }

I have inserted a LOG_FILE into the script to check what is happening.

the code is intended to search for a surname (which is always uppercase in the database) in four fields of a table named 'g2_Item' checking that the public has access to this file (g2_accessMap.g_permission).

The function works fine as long as no accented characters are involved but as soon as an accented character is included (èéêëöüçÈÉËÊÖÜ) the function returns no rows from the database table = 'nothing found'

There seem to be 2 problems:

1. The accented characters are not being upcased - but even if the accented character is inclued already upcased, no rows are found - equally the MySQL database is case insensitive.

2. the term submitted for searching in the mysql select is not being found

the search term surname appears as it should in the log_file, eg: 'HÖRR' (for which there is an entry in the database).

If I copy and use that term from the log_file and use it in an sql query on PHPmysqlAdmin I get around 100 rows including all HORR and HÖRR entries in the database.

But when that term is passed by the perl code to mysql the result is 0 rows found.

Also I copied a php form and function (https://www.elithecomputerguy.com/2019/12/mysql-search-form-with-html-and-php/) and adapted it for my database - it works fine with accented characters, returning all the rows containing 'HÖRR' or 'HORR'.

this is the perl code:

use Cookbook_grave; # the connection as above use CGI; use CGI::Carp qw(fatalsToBrowser); my $OK_CHARS='-a-zA-Z0-9\'\- èéêëöüçÈÉËÊÖÜ'; # A restrictive list, whi +ch # should be modified to match # as appropriate $match_surname = $q->param("what_surname"); $match_surname = uc($match_surname); #$match_surname =~ s/[^$OK_CHARS]/_/go; print LOG_FILE "surname: $match_surname\n"; # fetch all rows that match the surname # where title LIKE %$surname% $search_phrase = "%$match_surname%"; $dbh = $dbh_name; $link_ref = $name_ref; my $sth = $dbh->prepare ("SELECT title, id, description, keywords, + summary FROM Item, AccessSubscriberMap, AccessMap WHERE (Item.title LIKE ? OR Item.description LIKE ? OR Item.keywords LIKE ? OR Item.summary LIKE ? ) AND AccessSubscriberMap.itemId = Item.id AND AccessMap.accessListId = AccessSubscriberMap.accessListId AND AccessMap.userOrGroupId =4 AND AccessMap.permission != 4096 ORDER BY title"); $sth->execute ("$search_phrase","$search_phrase","$search_phrase", +"$search_phrase"); print LOG_FILE "search phrase: $search_phrase\n"; my $lol_ref = $sth->fetchall_arrayref(); my $nrows = (defined ($lol_ref) ? @{$lol_ref} : 0); print LOG_FILE "rows found: $nrows\n"; print LOG_FILE "search term 1: $search_term1\n"; print LOG_FILE "rows found: $nrows\n\n";
--

The log file records the $searchphrase as '%HöRR%'

the number of rows found is invariably 0.


In reply to perl dbi mysql and accented characters by combraxis

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.