Switch MySQL to utf8mb4: Migration Guide & Troubleshooting

目次

1. Introduction

Why Changing MySQL’s Character Set Is Necessary

The character set of a database is a crucial setting that determines how the characters of stored data are encoded and processed. In MySQL, the default character set is often latin1, which can cause issues when handling data that includes Japanese or other special characters. Especially during data migration or system unification, changing to an appropriate character set is important.

Common Issues and Their Causes

Typical problems related to MySQL character sets include the following.
  1. Character garbling
  • utf8 and latin1 are mixed
  • Client and server character set settings differ
  1. Search-related issues
  • Differences in collation prevent obtaining the intended search results
  • Sorting order differs from expectations
  1. Data migration problems
  • Because utf8mb4 is not used, emojis and special symbols cannot be stored
  • Character set conversion is not performed correctly during data export/import

Purpose and Structure of This Article

This article provides a comprehensive guide on MySQL character set changes, covering basic knowledge, how to change it, and troubleshooting.

Article Flow

  1. Basic knowledge of MySQL character sets
  2. How to check the current character set
  3. How to change MySQL character set
  4. Post-change troubleshooting
  5. Impact of character set changes on performance
  6. Recommended settings (best practices)
  7. FAQ (frequently asked questions)
By reading this guide, you will deepen your knowledge of MySQL character sets and be able to choose appropriate settings and avoid issues.

2. MySQL Character Sets: Basic Understanding

What is a character set (Character Set)?

Character set (Character Set) is the rule used when storing and processing characters as digital data. For example, when storing the Japanese character “あ”, UTF-8 represents it as the byte sequence E3 81 82, whereas Shift_JIS uses 82 A0. MySQL allows you to specify different character sets for each database or table, and choosing the appropriate character set helps prevent garbled text and makes system internationalization smoother.

Common character sets

Character SetFeaturesUse Cases
utf8UTF-8 up to 3 bytesDoes not support some special characters (e.g., emojis)
utf8mb44-byte UTF-8Supports emojis and special characters (recommended)
latin1ASCII compatibleUsed in legacy systems

What is collation (Collation)?

Collation defines the rules for comparing and sorting data using a character set. For example, it determines whether “A” and “a” are considered the same and how characters are ordered.

Common collations

CollationDescription
utf8_general_ciCase‑insensitive, suitable for general use
utf8_unicode_ciCollation based on the Unicode standard (recommended)
utf8mb4_binBinary comparison (used when exact matches are required)

utf8 vs utf8mb4 Differences

MySQL’s utf8 can actually store only characters up to 3 bytes, so it cannot handle some special characters (such as emojis or extended Chinese characters). In contrast, utf8mb4 can use up to 4 bytes, and its use is recommended for modern applications.
Character SetMaximum BytesEmoji SupportRecommendation
utf83 bytes❌ Not supported❌ Not recommended
utf8mb44 bytes✅ Supported✅ Recommended

Reasons to switch from utf8 to utf8mb4

  1. Future compatibility: In modern systems, utf8mb4 is becoming the standard.
  2. Storing special characters and emojis: Using utf8mb4 ensures you can safely handle data in social media posts and messaging apps.
  3. Internationalization support: Reduces the risk of garbled text when building multilingual systems.

Summary

  • Character set (Character Set) determines how data is stored and processed.
  • Collation defines the rules for comparing characters.
  • MySQL’s utf8 actually supports only up to 3 bytes, so using utf8mb4 is recommended.
  • utf8mb4_unicode_ci is the recommended collation for general use.

3. How to Check the Current Character Set

Before changing MySQL’s character set, it is important to verify the current settings. Since you can set different character sets for each database, table, and column, understand at which level a change is needed.

How to Check the Current Character Set

Check the MySQL Server’s Overall Character Set

First, check the default character set configuration for the entire MySQL server.
SHOW VARIABLES LIKE 'character_set%';
SHOW VARIABLES LIKE 'collation%';
Example output:
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | utf8mb4                    |
| character_set_connection | utf8mb4                    |
| character_set_database   | utf8mb4                    |
| character_set_filesystem | binary                     |
| character_set_results    | utf8mb4                    |
| character_set_server     | utf8mb4                    |
| character_set_system     | utf8                        |
+--------------------------+----------------------------+

Check the Character Set per Database

To check a specific database’s character set, use the following command.
SELECT SCHEMA_NAME, DEFAULT_CHARACTER_SET_NAME, DEFAULT_COLLATION_NAME
FROM information_schema.SCHEMATA
WHERE SCHEMA_NAME = 'database_name';
Example output
+----------------+----------------------+----------------------+
| SCHEMA_NAME    | DEFAULT_CHARACTER_SET_NAME | DEFAULT_COLLATION_NAME |
+----------------+----------------------+----------------------+
| my_database   | utf8mb4               | utf8mb4_unicode_ci   |
+----------------+----------------------+----------------------+

Check the Table’s Character Set

How to check a specific table’s character set.
SHOW CREATE TABLE table_name;
Example output
CREATE TABLE `users` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(255) DEFAULT NULL,
  `email` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 COLLATE=latin1_swedish_ci;
Checklist
  • DEFAULT CHARSET=latin1 → not utf8mb4, so it needs to be changed
  • COLLATE=latin1_swedish_ci → changing to utf8mb4_unicode_ci is more appropriate

Check the Column’s Character Set

To investigate a specific column’s character set, run the following SQL.
SELECT COLUMN_NAME, CHARACTER_SET_NAME, COLLATION_NAME 
FROM information_schema.COLUMNS 
WHERE TABLE_SCHEMA = 'database_name' 
AND TABLE_NAME = 'table_name';</> Example output
+-------------+--------------------+----------------------+
| COLUMN_NAME | CHARACTER_SET_NAME | COLLATION_NAME       |
+-------------+--------------------+----------------------+
| name        | latin1             | latin1_swedish_ci    |
| email       | utf8mb4            | utf8mb4_unicode_ci   |
+-------------+--------------------+----------------------+
In this case, the name column is using latin1, so changing it to utf8mb4 is advisable.

Summary

  • MySQL’s character set is configured at multiple levels (server, database, table, column)
  • By checking the character set at each level, you can make appropriate changes
  • Use commands like SHOW VARIABLES and SHOW CREATE TABLE to thoroughly understand the current configuration

4. How to Change MySQL Character Set

By properly changing MySQL’s character set, you can prevent garbled text and handle multilingual support smoothly. In this section, we will explain how to change the server-wide, database, table, and column character sets in order.

Change the Server-Wide Default Character Set

To change the server-wide default character set, you need to edit MySQL’s configuration file (my.cnf or my.ini).

Steps

  1. Open the configuration file
  • On Linux: bash sudo nano /etc/mysql/my.cnf
  • On Windows:
    • Open C:ProgramDataMySQLMySQL Server X.Xmy.ini
  1. Add or modify character set settings Add or modify the following in the mysqld section.
   [mysqld]
   character-set-server=utf8mb4
   collation-server=utf8mb4_unicode_ci
  1. Restart MySQL
   sudo systemctl restart mysql
On Windows:
   net stop MySQL && net start MySQL
  1. Verify after changes
   SHOW VARIABLES LIKE 'character_set_server';

Change Character Set at the Database Level

ALTER DATABASE mydatabase CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

Verification after change

SELECT SCHEMA_NAME, DEFAULT_CHARACTER_SET_NAME, DEFAULT_COLLATION_NAME 
FROM information_schema.SCHEMATA 
WHERE SCHEMA_NAME = 'mydatabase';

Change Character Set at the Table Level

ALTER TABLE users CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

Verification after change

SHOW CREATE TABLE users;

Change Character Set at the Column Level

ALTER TABLE users MODIFY COLUMN name VARCHAR(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

Verification after change

SELECT COLUMN_NAME, CHARACTER_SET_NAME, COLLATION_NAME 
FROM information_schema.COLUMNS 
WHERE TABLE_SCHEMA = 'mydatabase' 
AND TABLE_NAME = 'users';

Verification After Changes and the Importance of Backups

To maintain data integrity after changing the character set, please follow these steps.

Data Backup

mysqldump -u root -p --default-character-set=utf8mb4 mydatabase > backup.sql

Reconfirm Settings

SHOW VARIABLES LIKE 'character_set%';
SHOW VARIABLES LIKE 'collation%';
SHOW CREATE TABLE users;

Add and Display Test Data

INSERT INTO users (name, email) VALUES ('Test User', 'test@example.com');
SELECT * FROM users;

Summary

  • Server-wide character set change: Edit my.cnf and set character-set-server=utf8mb4
  • Database character set change: ALTER DATABASE mydatabase CHARACTER SET utf8mb4
  • Table character set change: ALTER TABLE users CONVERT TO CHARACTER SET utf8mb4
  • Column character set change: ALTER TABLE users MODIFY COLUMN name VARCHAR(255) CHARACTER SET utf8mb4
  • After changes, always verify the settings and test the data

5. Troubleshooting After Changing Character Encoding

After changing MySQL’s character set, there are cases where it doesn’t work properly or data becomes garbled. In this section, we will explain in detail the common problems and their solutions.

Causes of Garbled Text and How to Address Them

If garbled text occurs after changing the character set, the following causes are possible.
CauseVerification MethodSolution
Client character set setting differsSHOW VARIABLES LIKE 'character_set_client';Execute SET NAMES utf8mb4;
Data before the change was stored in a different encodingSELECT HEX(column_name) FROM table_name;CONVERT() or re-export the data
Encoding at connection time is not appropriateConnect with mysql --default-character-set=utf8mb4Change the client-side character set setting
Application-side settings for PHP, Python, etc. are incorrectmysqli_set_charset($conn, 'utf8mb4');Standardize the application’s character set configuration

Solution 1: Properly Set the Client Character Set

SET NAMES utf8mb4;

Solution 2: Correctly Convert Pre-Change Data

UPDATE users SET name = CONVERT(CAST(CONVERT(name USING latin1) AS BINARY) USING utf8mb4);

latin1 to utf8mb4 Post-Change Considerations

Safe Procedure
  1. Back up the current data
   mysqldump -u root -p --default-character-set=latin1 mydatabase > backup.sql
  1. Change the database character set
   ALTER DATABASE mydatabase CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
  1. Change the table character set
   ALTER TABLE users CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
  1. Re-import the data
   mysql -u root -p --default-character-set=utf8mb4 mydatabase < backup.sql

Data Cannot Be Searched Correctly After the Change

Case 1: LIKE Search Not Working

SELECT * FROM users WHERE name COLLATE utf8mb4_unicode_ci LIKE '%Tanaka%';

Case 2: Sort Order Changed

SELECT * FROM users ORDER BY BINARY name;

Measures on the Application Side

For PHP

mysqli_set_charset($conn, 'utf8mb4');

For Python (MySQL Connector)

import mysql.connector

conn = mysql.connector.connect(
    host="localhost",
    user="root",
    password="password",
    database="mydatabase",
    charset="utf8mb4"
)

For Node.js (MySQL2)

const mysql = require('mysql2');

const connection = mysql.createConnection({
  host: 'localhost',
  user: 'root',
  password: 'password',
  database: 'mydatabase',
  charset: 'utf8mb4'
});

Summary

  • Issues that arise after changing the character set can be categorized into three areas: client settings, data conversion, and application settings.
  • To prevent garbled text, unify the client-side character set with SET NAMES utf8mb4.
  • Be aware of changes in LIKE searches and sort order, and specify COLLATE as needed.
  • Setting utf8mb4 on the application side also prevents encoding mismatches.

6. Impact of Character Set Changes on Performance

When changing MySQL’s character set to utf8mb4, there are several performance considerations such as increased storage usage and index impact. This section explains the effects of character set changes and optimal measures.

Increase in Storage Usage Due to Character Set Change

utf8mb4 uses up to 4 bytes per character compared to the traditional utf8, so the overall table data size may increase.

Bytes per character for each character set

Character SetMaximum Bytes per Character
latin11 byte
utf83 bytes
utf8mb44 bytes
For example, in utf8 a VARCHAR(255) can be up to 765 bytes (255×3), whereas in utf8mb4 it can be up to 1020 bytes (255×4).

Solution

ALTER TABLE posts MODIFY COLUMN title VARCHAR(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

Increase in Index Size

In MySQL, there is a limit on the maximum index key size. Changing to utf8mb4 makes indexes larger, and there is a risk that indexes become unusable.

Check Index Impact

SHOW INDEX FROM users;
Example Error
ERROR 1071 (42000): Specified key was too long; max key length is 767 bytes

Solution

ALTER TABLE users MODIFY COLUMN email VARCHAR(191) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

Impact on Query Performance

Changing the character set to utf8mb4 may affect query execution speed.

Operations That May Be Affected

  • LIKE searches containing large amounts of data
  • ORDER BY processing
  • JOIN query performance

Solution

CREATE INDEX idx_name ON users(name(100));

Memory Usage and Buffer Size Tuning

Changing to utf8mb4 may increase memory consumption.

Recommended Settings

[mysqld]
innodb_buffer_pool_size = 1G
query_cache_size = 128M

Summary

  • Changing to utf8mb4 increases storage usage
  • Index size grows and may exceed limits
  • Query performance can be affected
  • Memory usage rises, requiring buffer size adjustments

7. Recommended Settings (Best Practices)

By setting MySQL character sets appropriately, you can maintain data integrity while optimizing performance. In this section, we will specifically introduce the recommended character set configurations for MySQL and explain the key points for optimal setup.

Recommended MySQL Character Set Settings

ItemRecommended SettingReason
Character Setutf8mb4Can handle all Unicode characters, including emojis and special symbols
Collationutf8mb4_unicode_ciCase‑insensitive and suitable for multilingual support
Storage EngineInnoDBOffers a good balance of performance and integrity
Index String LengthVARCHAR(191)Stays within MySQL’s index length limits

my.cnf Recommended Settings

1. MySQL Server Character Set Settings

[mysqld]
character-set-server = utf8mb4
collation-server = utf8mb4_unicode_ci
init-connect='SET NAMES utf8mb4'
skip-character-set-client-handshake
innodb_large_prefix = ON
innodb_file_format = Barracuda
innodb_file_per_table = 1
innodb_buffer_pool_size = 1G
query_cache_size = 128M

2. Client‑Side Character Set Settings

[client]
default-character-set = utf8mb4

Recommended Database Settings

CREATE DATABASE mydatabase DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
To change the character set of an existing database:
ALTER DATABASE mydatabase CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

Recommended Table Settings

CREATE TABLE users (
  id INT AUTO_INCREMENT PRIMARY KEY,
  name VARCHAR(191) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci,
  email VARCHAR(191) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci,
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

Changing Character Set of Existing Tables

ALTER TABLE users CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

Difference Between utf8mb4_general_ci and utf8mb4_unicode_ci

CollationFeaturesUse Cases
utf8mb4_general_ciComparison is fast but less accuratePerformance‑focused systems
utf8mb4_unicode_ciConforms to the Unicode standard, enabling more accurate comparisonsGeneral use (recommended)
If multilingual support or precise sorting is needed, select utf8mb4_unicode_ci.

Index Optimization

CREATE FULLTEXT INDEX idx_fulltext ON articles(content);

Summary

  • The combination of utf8mb4 + utf8mb4_unicode_ci is recommended
  • Standardize server settings (my.cnf) and unify the character set on connection
  • Explicitly specify utf8mb4 at the database, table, and column levels
  • Using VARCHAR(191) avoids index length limitations
  • Using utf8mb4_unicode_ci enables accurate comparisons

8. FAQ (Frequently Asked Questions)

We have compiled common questions about changing MySQL character sets in real-world operations. How to handle errors and choose optimal settings are explained in detail.

What is the difference between utf8 and utf8mb4?

SHOW VARIABLES LIKE 'character_set_server';

Will data be lost when changing MySQL character set?

mysqldump -u root -p --default-character-set=utf8mb4 mydatabase > backup.sql

How to resolve garbled characters?

UPDATE users SET = CONVERT(CAST(CONVERT(name USING latin1) AS BINARY) USING utf8mb4);

What are the risks of converting from latin1 to utf8mb4?

ALTER DATABASE mydatabase CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

Does switching to utf8mb4 affect performance?

ALTER TABLE users MODIFY COLUMN email VARCHAR(191) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

Which should be used: utf8mb4_general_ci or utf8mb4_unicode_ci?

CollationFeaturesUse Cases
utf8mb4_general_ciComparison is fast but lacks accuracyPerformance‑focused systems
utf8mb4_unicode_ciAccurate comparison based on the Unicode standardGeneral use (recommended)

Will queries become slower after switching to utf8mb4?

CREATE FULLTEXT INDEX idx_fulltext ON articles(content);

Summary

Recommend using utf8mb4. utf8 is not recommended due to its limitations.Before changing the character set, always verify the settings with SHOW VARIABLES.Use data export/import to prevent garbled characters.Consider index impact and recommend VARCHAR(191).Take performance into account and set appropriate indexes.

Finally

Changing MySQL character sets is not just a configuration tweak; it is a critical task that affects data integrity and performance. By following proper settings and procedures, you can migrate to utf8mb4 safely and effectively. 🔹 Follow the steps in this article to apply the proper character set configuration! 🔹