It looks like mysql utf8 (alias for utf8mb3) uses up to 3 bytes while utf8mb4 uses up to 4 bytes. It might be an interesting exercise to figure out what characters were not fitting into 3 bytes. It seems that utf8mb3 uses code point values from 0 to 65535. I guess you could look for ord($char) > 65535.