Fixing Japanese Character Garbling in MySQL with UTF8MB4

目次

1. Introduction

Can’t Handle Japanese Well in MySQL? A Thorough Explanation of the Causes and Solutions

Have you ever experienced issues like “garbled characters” or “???” when handling Japanese in MySQL, which is widely used as a database in web applications and WordPress? Especially for beginners or when using MySQL in local development environments (such as XAMPP or MAMP) or virtual environments like Docker, cases where Japanese doesn’t display correctly are common. This is mainly because MySQL’s character encoding settings are not appropriate. In this article, we explain in an easy-to-understand way how to configure MySQL to handle Japanese correctly, along with common troubles and their solutions. Additionally, it includes practical know-how useful in real-world scenarios, such as settings for Docker environments, my.cnf configurations, and methods to fix existing databases. This content is designed so that a wide range of readers, from beginners to engineers in development environments, can confidently put it into practice, so please read to the end. In the next section, we explain the root cause of “why Japanese text becomes garbled?”.

2. Main Causes of Japanese Text Garbling

Why Doesn’t Japanese Display Correctly in MySQL?

If Japanese text in MySQL is displayed as “???” or incomprehensible symbols, the cause is almost certainly a character encoding setting error. MySQL is a very flexible database, but if the character encoding (character set) and collation settings do not match, it cannot store or retrieve data correctly. I have summarized the common causes into three below.

Cause 1: Default Character Encoding Remains latin1

In older versions of MySQL or initial settings, the character encoding may be set to latin1 (for Western European languages). latin1 cannot handle Japanese correctly, and since the characters get corrupted at the point of data insertion, by the time it is saved in the database, it is already garbled.

Cause 2: Mismatch in Character Encoding Between Client and Server

In MySQL, character encoding is involved at the following three timings.
  • Client transmission time (character_set_client)
  • Server-side processing time (character_set_server)
  • Result output time (character_set_results)
For example, even if the client uses utf8mb4, if the server side processes it with latin1, the characters will be corrupted midway. This mismatch is the most common pitfall.

Cause 3: Inconsistent Settings for Database, Tables, and Columns

When creating a new table, especially if you do not specify the character encoding explicitly, MySQL’s default settings are applied as is. As a result,
  • the database is utf8mb4 but,
  • the table is utf8,
  • the column is latin1 etc.,
it becomes an inconsistent state, and garbling occurs during saving and display.

Summary: Most Causes Are Due to “Character Encoding Mismatches”

Most causes of Japanese text garbling in MySQL are due to “the set character encodings not matching.” In the next section, we will explain in detail how to check MySQL’s current character encoding settings. By performing appropriate checks, you can identify the cause of the garbling and fix it quickly.

3. How to Check MySQL’s Character Encoding Settings

To Pinpoint the Cause of Issues, “Checking Current Settings” Is the First Step

When Japanese cannot be handled correctly in MySQL, the first thing to check is the current settings of character encoding (character set) and collation. In MySQL, multiple character encodings are exchanged between the client and server, and they need to match. Here, we explain how to check the settings using the command line or SQL queries.

SHOW VARIABLES Command to Check Character Encoding

While connected to MySQL, you can check the current character encoding settings by executing the following SQL.
SHOW VARIABLES LIKE 'character_set%';
Executing this command will produce output like the following:
+--------------------------+---------+
| Variable_name            | Value   |
+--------------------------+---------+
| character_set_client     | utf8mb4 |
| character_set_connection | utf8mb4 |
| character_set_database   | utf8mb4 |
| character_set_results    | utf8mb4 |
| character_set_server     | utf8mb4 |
| character_set_system     | utf8    |
+--------------------------+---------+

Meaning of Each Setting Item

Item NameMeaning and Role
character_set_clientThe encoding of strings sent from the client
character_set_connectionThe character encoding used during communication between client and server
character_set_resultsThe character encoding when query results are returned to the client
character_set_databaseThe default character encoding of the currently selected database
character_set_serverThe default character encoding when creating new databases or tables
character_set_systemThe character encoding used internally by the server (usually no need to change)
In particular, it is important that the three character_set_client, character_set_connection, and character_set_results match. If these three do not match, the phenomenon occurs where the sent strings arrive garbled or are returned garbled.

Checkpoints to Prevent Garbled Text

  • Check if all items are set to utf8mb4
  • If different character encodings are mixed, perform the setting changes introduced later
  • Be careful as character encodings may be specified separately for tables or columns

Supplement: Also Check the Collation

Collation affects the sorting order and comparison methods of strings. You can check it with the following command:
SHOW VARIABLES LIKE 'collation%';
It is unlikely to be the direct cause of garbled text, but since it relates to sorting and search accuracy involving Japanese, it’s reassuring to confirm that utf8mb4_general_ci or utf8mb4_unicode_ci is being used. In the next section, we will explain how to actually change these settings, specific methods for handling Japanese correctly in MySQL.

4. How to Configure Settings to Handle Japanese Properly

Say Goodbye to Garbled Text with Proper Settings

To handle Japanese correctly in MySQL, unifying all character encoding settings is important. In particular, utf8mb4 is a recommended setting that supports not only Japanese but also emojis and special symbols. In this section, we will explain in detail how to configure settings on the client side, server side, tables, and columns.

4.1 Client-Side Settings: Specify Explicitly at Connection

By executing the following command immediately after connecting to MySQL, you can fix the character encoding settings during communication to utf8mb4.
SET NAMES 'utf8mb4';
This is reflected simultaneously in the following three variables:
  • character_set_client
  • character_set_connection
  • character_set_results

✅Note:

  • When connecting from PHP, describe it like mysqli_set_charset($conn, 'utf8mb4');.
  • When using the mysql command in CLI, it is also effective to specify --default-character-set=utf8mb4.

4.2 Server-Side Settings: Persistent Configuration with my.cnf

By adding the following descriptions to the server’s configuration file my.cnf or my.ini, you can change the default character encoding for the entire MySQL to utf8mb4.
[client]
 default-character-set = utf8mb4
 [mysql]
default-character-set = utf8mb4
 [mysqld]
character-set-server = utf8mb4 collation-server = utf8mb4_general_ci

✅Caution:

  • After changing the settings, MySQL needs to be restarted.
  • Example: sudo systemctl restart mysql (Linux)
  • The file location varies by environment, and in Linux, /etc/mysql/my.cnf or /etc/my.cnf are commonly used.

4.3 Specifying Character Encoding for Databases and Tables

When creating a new database or table, be sure to explicitly specify the character encoding.
Example of Database Creation:
CREATE DATABASE mydb CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;
Example of Table Creation:
CREATE TABLE users (
  id INT AUTO_INCREMENT PRIMARY KEY,
  name VARCHAR(100)
) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;
Changing an Existing Table:
ALTER TABLE users CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;

4.4 Recommended Character Encoding: Why utf8mb4?

MySQL also has a character encoding named utf8, but this only supports UTF-8 representations up to a maximum of 3 bytes. Therefore, there is a problem that emojis and some kanji characters (such as variant forms) cannot be saved. On the other hand, utf8mb4 supports up to 4 bytes and is fully compatible with UTF-8, so it is now the mainstream choice to use this one. In the next chapter, we will explain specific Japanese settings and precautions when using MySQL in a Docker environment. Let’s keep in mind the points to avoid garbled text even in virtual environments.

5. Handling Japanese in Docker Environments

To Handle Japanese Correctly Even in Container Environments

In recent years, the use of Docker as a development environment has increased, but we often hear complaints like “Japanese characters are garbled in MySQL on Docker.” This is caused by inappropriate locale settings in the container or initial MySQL settings. In this section, we introduce specific countermeasures for handling Japanese correctly when using MySQL in a Docker environment.

5.1 Setting the Locale (Language Environment) to Japanese-Compatible in Dockerfile

Not only for MySQL containers but also when handling Japanese on the application server side, locale settings are necessary. The following is an example of a Debian-based Dockerfile:
RUN apt-get update && apt-get install -y locales   && locale-gen ja_JP.UTF-8   && update-locale LANG=ja_JP.UTF-8
ENV LANG=ja_JP.UTF-8
ENV LC_ALL=ja_JP.UTF-8

✅Points:

  • Prevents encoding errors when reading and writing Japanese files on the application side.
  • Affects not only MySQL but also execution environments like PHP and Python.

5.2 How to Specify Character Encoding for MySQL in docker-compose

When starting a MySQL container using docker-compose.yml, you can specify the character encoding with environment variables as follows.
services:
  db:
    image: mysql:8.0
    container_name: mysql-ja
    environment:
      MYSQL_ROOT_PASSWORD: rootpass
      MYSQL_DATABASE: mydb
      MYSQL_USER: user
      MYSQL_PASSWORD: password
      TZ: Asia/Tokyo
      LANG: ja_JP.UTF-8
      LC_ALL: ja_JP.UTF-8
    command:
      --character-set-server=utf8mb4 --collation-server=utf8mb4_general_ci
    ports:
      - "3306:3306"
    volumes:
      - ./mysql-data:/var/lib/mysql

✅Note:

  • You can set MySQL startup parameters in the command: section.
  • TZ and LANG are also effective for setting up a Japanese environment.

5.3 Verifying Japanese Operation Inside the MySQL Container

To verify if MySQL is correctly set to utf8mb4, enter the MySQL container and execute the command as follows:
docker exec -it mysql-ja mysql -u root -p
After logging in, check the settings with the following command:
SHOW VARIABLES LIKE 'character_set%';
If everything is set to utf8mb4, problems with saving and displaying Japanese are less likely to occur.

Summary: In Docker Environments, “Startup Settings” and “Locale” Are Key

To safely handle Japanese with MySQL even in Docker environments,
  • Explicitly specify utf8mb4 when starting the MySQL container
  • Set the locale of the application-side container to ja_JP.UTF-8
such prior settings are extremely important. In the next chapter, we will concisely summarize the points so far and provide tips for safely handling Japanese in MySQL in the future. Please make use of this as a total summary of the article.

6. Common Issues and Their Solutions

Still Getting Garbled Text After Setup…? There Might Be Remaining Causes

Even after changing the MySQL settings to utf8mb4, cases where Japanese does not display correctly or cannot be saved are not uncommon. This section introduces commonly reported issues and their specific solutions.

Trouble 1: Settings Changes Not Reflected

Cause:After changing MySQL’s configuration files (my.cnf or docker-compose.yml), cases where the changes are not reflected because MySQL has not been restarted are common.Solution:
  • In a server environment, restart with sudo systemctl restart mysql
  • For Docker, run docker-compose down followed by docker-compose up -d

Trouble 2: Japanese Characters Garbled in Terminal or Command Line

Cause:This is a case of garbled characters not due to MySQL itself, but because of the terminal’s display character encoding. For example, UTF-8 not displaying correctly in Windows Command Prompt.Solution:
  • For Windows: Switch to UTF-8 with the chcp 65001 command
  • For macOS/Linux: Set the terminal’s encoding to UTF-8 (most are compatible by default)

Trouble 3: Existing Database or Tables Created with latin1

Cause:If not newly created, but an already operational database or tables were created with latin1, the Japanese data within may already be corrupted.Solution:
  1. Check the table structure:
   SHOW CREATE TABLE your_table_name;
  1. Convert the table:
   ALTER TABLE your_table_name CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;
Note:Data that is already corrupted cannot be repaired with this operation. Take backups or dumps in advance and consider manual fixes as well.

Trouble 4: Character Encoding Mismatch on the Application Side (PHP, Python, etc.)

Cause:Even if the MySQL side supports utf8mb4, if the strings sent by the application are in a different encoding, garbled characters will occur.Solution:
  • PHP:mysqli_set_charset($conn, "utf8mb4");
  • Python (MySQL Connector): Specify charset='utf8mb4' at connection time

Trouble 5: Garbled Characters When Integrating with CSV or Excel

Cause:During import/export with CSV or Excel, the character encoding may be Shift-JIS or UTF-8 with BOM, so attention is needed for compatibility with MySQL’s utf8mb4.Solution:
  • Convert the character encoding to UTF-8 before reading the CSV
  • During export, explicitly use SET NAMES 'utf8mb4';
  • When loading into Excel, save as “UTF-8 (with BOM)”

Comprehensive Checklist for Resolving Issues

Checklist ItemStatus
All character_set_* are utf8mb4
collation_server is utf8mb4_general_ci
Character encoding explicitly set for database, tables, and columns
Application’s sent character encoding is utf8mb4
Encoding in usage environment (terminal, editor, etc.) is UTF-8
In the next section, we’ll concisely summarize the points so far and provide tips for safely handling Japanese in MySQL in the future. Please make use of this as a summary of the article.

7. Summary

Reviewing the Necessary Settings and Mindset for Handling Japanese in MySQL

To properly handle Japanese in MySQL, rather than relying on the misconception that “just setting it to utf8 for now will be fine,” consistency in settings and understanding the overall flow are important.

Review of the Main Points Explained in This Article:

  • The main causes of Japanese text garbling are the use of inappropriate character encodings like latin1, or inconsistencies in settings between the client and server.
  • MySQL’s character encoding settings can be checked with the SHOW VARIABLES command.
  • The recommended character encoding is utf8mb4. This is the complete version of UTF-8 and supports emojis and variant Chinese characters.
  • It is desirable to perform settings in three stages: client, server, and database/table levels.
  • In Docker environments, specifying command: and LANG is essential. It is necessary to adjust both the locale and character encoding.
  • When troubleshooting occurs, isolate and address the causes step by step. Check not only the MySQL itself but also the terminal, applications, and interactions with external data.

Points to Keep in Mind for Future Operations

  • When building a new MySQL environment, design it assuming utf8mb4 from the initial stage.
  • When developing in teams or multiple environments, document and share configuration files and connection parameters.
  • In Docker or CI/CD environments, automation of settings (environment variables and configuration file management) is key.
  • When importing and exporting data, consider using character encoding conversion tools (such as iconv or nkf).

Finally

Once you properly set up an environment for handling Japanese in MySQL, subsequent operations and development will be very smooth. If you understand “why garbling occurs” and “where and how to set it up,” you can prevent troubles in advance and achieve stable data processing. I hope this article helps make your development environment more comfortable and secure.

8. Frequently Asked Questions (FAQ)

Resolving Common Questions About MySQL and Japanese Characters

Q1. Japanese characters are displayed as “???” in MySQL. What is the cause?

A.The main cause of displaying “???” is a mismatch in character encoding. For example, if the client sends Japanese in utf8mb4 and the server receives it in latin1, garbled characters occur. Executing SET NAMES 'utf8mb4'; at the time of connection resolves the issue in many cases.

Q2. Even after setting utf8mb4 in my.cnf, it is not reflected.

A.Simply editing my.cnf will not apply the changes. You need to restart the MySQL server.On Linux, use sudo systemctl restart mysql; in a Docker environment, be sure to run docker-compose down followed by docker-compose up -d.

Q3. Japanese characters are garbled in an existing table. Can it be fixed?\n

A.Complete repair can be difficult, but the following steps can address it.
  1. Check the table structure (SHOW CREATE TABLE)
  2. Convert the character encoding
   ALTER TABLE your_table_name CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;
However, if the already saved data is corrupted, you may need restoration from a backup or manual correction.

Q4. I’m using MySQL in Docker, but Japanese input causes garbled characters.

A.In addition to MySQL settings, you need to add locale settings (such as LANG=ja_JP.UTF-8) in the Dockerfile or docker-compose.yml. Also, explicitly specify --character-set-server=utf8mb4 in the startup command for the MySQL container.

Q5. What is the difference between utf8 and utf8mb4? Which one should I use?

A.MySQL’s utf8 actually only handles 3-byte UTF-8 compatible characters. On the other hand, utf8mb4 supports 4 bytes and can properly handle emojis and some kanji characters. Currently, utf8mb4 is recommended from the perspectives of compatibility and future-proofing.

Q6. CSV files exported from Excel are garbled. What should I do?

A.Excel defaults to using Shift_JIS or BOM-attached UTF-8 in some cases, which can cause a mismatch with MySQL’s character encoding. Save the CSV file in UTF-8 or execute SET NAMES 'utf8mb4'; when importing to align the encoding on the MySQL side.
If this FAQ does not resolve your issue, review the settings from the beginning or rebuild the development environment for each setup as one approach. Patiently addressing technical challenges leads to proper handling of Japanese data.