The character set of a database is a crucial setting that determines how the characters of stored data are encoded and processed. In MySQL, the default character set is often latin1, which can cause issues when handling data that includes Japanese or other special characters. Especially during data migration or system unification, changing to an appropriate character set is important.
Common Issues and Their Causes
Typical problems related to MySQL character sets include the following.
Character garbling
utf8 and latin1 are mixed
Client and server character set settings differ
Search-related issues
Differences in collation prevent obtaining the intended search results
Sorting order differs from expectations
Data migration problems
Because utf8mb4 is not used, emojis and special symbols cannot be stored
Character set conversion is not performed correctly during data export/import
Purpose and Structure of This Article
This article provides a comprehensive guide on MySQL character set changes, covering basic knowledge, how to change it, and troubleshooting.
Article Flow
Basic knowledge of MySQL character sets
How to check the current character set
How to change MySQL character set
Post-change troubleshooting
Impact of character set changes on performance
Recommended settings (best practices)
FAQ (frequently asked questions)
By reading this guide, you will deepen your knowledge of MySQL character sets and be able to choose appropriate settings and avoid issues.
2. MySQL Character Sets: Basic Understanding
What is a character set (Character Set)?
Character set (Character Set) is the rule used when storing and processing characters as digital data. For example, when storing the Japanese character “あ”, UTF-8 represents it as the byte sequence E3 81 82, whereas Shift_JIS uses 82 A0.
MySQL allows you to specify different character sets for each database or table, and choosing the appropriate character set helps prevent garbled text and makes system internationalization smoother.
Common character sets
Character Set
Features
Use Cases
utf8
UTF-8 up to 3 bytes
Does not support some special characters (e.g., emojis)
utf8mb4
4-byte UTF-8
Supports emojis and special characters (recommended)
latin1
ASCII compatible
Used in legacy systems
What is collation (Collation)?
Collation defines the rules for comparing and sorting data using a character set. For example, it determines whether “A” and “a” are considered the same and how characters are ordered.
Common collations
Collation
Description
utf8_general_ci
Case‑insensitive, suitable for general use
utf8_unicode_ci
Collation based on the Unicode standard (recommended)
utf8mb4_bin
Binary comparison (used when exact matches are required)
utf8 vs utf8mb4 Differences
MySQL’s utf8 can actually store only characters up to 3 bytes, so it cannot handle some special characters (such as emojis or extended Chinese characters). In contrast, utf8mb4 can use up to 4 bytes, and its use is recommended for modern applications.
Character Set
Maximum Bytes
Emoji Support
Recommendation
utf8
3 bytes
❌ Not supported
❌ Not recommended
utf8mb4
4 bytes
✅ Supported
✅ Recommended
Reasons to switch from utf8 to utf8mb4
Future compatibility: In modern systems, utf8mb4 is becoming the standard.
Storing special characters and emojis: Using utf8mb4 ensures you can safely handle data in social media posts and messaging apps.
Internationalization support: Reduces the risk of garbled text when building multilingual systems.
Summary
Character set (Character Set) determines how data is stored and processed.
Collation defines the rules for comparing characters.
MySQL’s utf8 actually supports only up to 3 bytes, so using utf8mb4 is recommended.
utf8mb4_unicode_ci is the recommended collation for general use.
3. How to Check the Current Character Set
Before changing MySQL’s character set, it is important to verify the current settings.
Since you can set different character sets for each database, table, and column, understand at which level a change is needed.
How to Check the Current Character Set
Check the MySQL Server’s Overall Character Set
First, check the default character set configuration for the entire MySQL server.
SHOW VARIABLES LIKE 'character_set%';
SHOW VARIABLES LIKE 'collation%';
DEFAULT CHARSET=latin1 → not utf8mb4, so it needs to be changed
COLLATE=latin1_swedish_ci → changing to utf8mb4_unicode_ci is more appropriate
Check the Column’s Character Set
To investigate a specific column’s character set, run the following SQL.
SELECT COLUMN_NAME, CHARACTER_SET_NAME, COLLATION_NAME
FROM information_schema.COLUMNS
WHERE TABLE_SCHEMA = 'database_name'
AND TABLE_NAME = 'table_name';</> Example output
In this case, the name column is using latin1, so changing it to utf8mb4 is advisable.
Summary
MySQL’s character set is configured at multiple levels (server, database, table, column)
By checking the character set at each level, you can make appropriate changes
Use commands like SHOW VARIABLES and SHOW CREATE TABLE to thoroughly understand the current configuration
4. How to Change MySQL Character Set
By properly changing MySQL’s character set, you can prevent garbled text and handle multilingual support smoothly.
In this section, we will explain how to change the server-wide, database, table, and column character sets in order.
Change the Server-Wide Default Character Set
To change the server-wide default character set, you need to edit MySQL’s configuration file (my.cnf or my.ini).
Steps
Open the configuration file
On Linux: bash sudo nano /etc/mysql/my.cnf
On Windows:
Open C:ProgramDataMySQLMySQL Server X.Xmy.ini
Add or modify character set settings Add or modify the following in the mysqld section.
SHOW VARIABLES LIKE 'character_set%';
SHOW VARIABLES LIKE 'collation%';
SHOW CREATE TABLE users;
Add and Display Test Data
INSERT INTO users (name, email) VALUES ('Test User', 'test@example.com');
SELECT * FROM users;
Summary
Server-wide character set change: Edit my.cnf and set character-set-server=utf8mb4
Database character set change: ALTER DATABASE mydatabase CHARACTER SET utf8mb4
Table character set change: ALTER TABLE users CONVERT TO CHARACTER SET utf8mb4
Column character set change: ALTER TABLE users MODIFY COLUMN name VARCHAR(255) CHARACTER SET utf8mb4
After changes, always verify the settings and test the data
5. Troubleshooting After Changing Character Encoding
After changing MySQL’s character set, there are cases where it doesn’t work properly or data becomes garbled.
In this section, we will explain in detail the common problems and their solutions.
Causes of Garbled Text and How to Address Them
If garbled text occurs after changing the character set, the following causes are possible.
Cause
Verification Method
Solution
Client character set setting differs
SHOW VARIABLES LIKE 'character_set_client';
Execute SET NAMES utf8mb4;
Data before the change was stored in a different encoding
SELECT HEX(column_name) FROM table_name;
CONVERT() or re-export the data
Encoding at connection time is not appropriate
Connect with mysql --default-character-set=utf8mb4
Change the client-side character set setting
Application-side settings for PHP, Python, etc. are incorrect
mysqli_set_charset($conn, 'utf8mb4');
Standardize the application’s character set configuration
Solution 1: Properly Set the Client Character Set
SET NAMES utf8mb4;
Solution 2: Correctly Convert Pre-Change Data
UPDATE users SET name = CONVERT(CAST(CONVERT(name USING latin1) AS BINARY) USING utf8mb4);
Issues that arise after changing the character set can be categorized into three areas: client settings, data conversion, and application settings.
To prevent garbled text, unify the client-side character set with SET NAMES utf8mb4.
Be aware of changes in LIKE searches and sort order, and specify COLLATE as needed.
Setting utf8mb4 on the application side also prevents encoding mismatches.
6. Impact of Character Set Changes on Performance
When changing MySQL’s character set to utf8mb4, there are several performance considerations such as increased storage usage and index impact.
This section explains the effects of character set changes and optimal measures.
Increase in Storage Usage Due to Character Set Change
utf8mb4 uses up to 4 bytes per character compared to the traditional utf8, so the overall table data size may increase.
Bytes per character for each character set
Character Set
Maximum Bytes per Character
latin1
1 byte
utf8
3 bytes
utf8mb4
4 bytes
For example, in utf8 a VARCHAR(255) can be up to 765 bytes (255×3), whereas in utf8mb4 it can be up to 1020 bytes (255×4).
Solution
ALTER TABLE posts MODIFY COLUMN title VARCHAR(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
Increase in Index Size
In MySQL, there is a limit on the maximum index key size.
Changing to utf8mb4 makes indexes larger, and there is a risk that indexes become unusable.
Check Index Impact
SHOW INDEX FROM users;
Example Error
ERROR 1071 (42000): Specified key was too long; max key length is 767 bytes
Solution
ALTER TABLE users MODIFY COLUMN email VARCHAR(191) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
Impact on Query Performance
Changing the character set to utf8mb4 may affect query execution speed.
Operations That May Be Affected
LIKE searches containing large amounts of data
ORDER BY processing
JOIN query performance
Solution
CREATE INDEX idx_name ON users(name(100));
Memory Usage and Buffer Size Tuning
Changing to utf8mb4 may increase memory consumption.
By setting MySQL character sets appropriately, you can maintain data integrity while optimizing performance.
In this section, we will specifically introduce the recommended character set configurations for MySQL and explain the key points for optimal setup.
Recommended MySQL Character Set Settings
Item
Recommended Setting
Reason
Character Set
utf8mb4
Can handle all Unicode characters, including emojis and special symbols
Collation
utf8mb4_unicode_ci
Case‑insensitive and suitable for multilingual support
Storage Engine
InnoDB
Offers a good balance of performance and integrity
CREATE DATABASE mydatabase DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
To change the character set of an existing database:
ALTER DATABASE mydatabase CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
Recommended Table Settings
CREATE TABLE users (
id INT AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(191) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci,
email VARCHAR(191) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
Changing Character Set of Existing Tables
ALTER TABLE users CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
Difference Between utf8mb4_general_ci and utf8mb4_unicode_ci
Collation
Features
Use Cases
utf8mb4_general_ci
Comparison is fast but less accurate
Performance‑focused systems
utf8mb4_unicode_ci
Conforms to the Unicode standard, enabling more accurate comparisons
General use (recommended)
✅ If multilingual support or precise sorting is needed, select utf8mb4_unicode_ci.
Index Optimization
CREATE FULLTEXT INDEX idx_fulltext ON articles(content);
Summary
The combination of utf8mb4 + utf8mb4_unicode_ci is recommended
Standardize server settings (my.cnf) and unify the character set on connection
Explicitly specify utf8mb4 at the database, table, and column levels
Using VARCHAR(191) avoids index length limitations
Using utf8mb4_unicode_ci enables accurate comparisons
8. FAQ (Frequently Asked Questions)
We have compiled common questions about changing MySQL character sets in real-world operations. How to handle errors and choose optimal settings are explained in detail.
What is the difference between utf8 and utf8mb4?
SHOW VARIABLES LIKE 'character_set_server';
Will data be lost when changing MySQL character set?
UPDATE users SET = CONVERT(CAST(CONVERT(name USING latin1) AS BINARY) USING utf8mb4);
What are the risks of converting from latin1 to utf8mb4?
ALTER DATABASE mydatabase CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
Does switching to utf8mb4 affect performance?
ALTER TABLE users MODIFY COLUMN email VARCHAR(191) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
Which should be used: utf8mb4_general_ci or utf8mb4_unicode_ci?
Collation
Features
Use Cases
utf8mb4_general_ci
Comparison is fast but lacks accuracy
Performance‑focused systems
utf8mb4_unicode_ci
Accurate comparison based on the Unicode standard
General use (recommended)
Will queries become slower after switching to utf8mb4?
CREATE FULLTEXT INDEX idx_fulltext ON articles(content);
Summary
✅ Recommend using utf8mb4. utf8 is not recommended due to its limitations. ✅ Before changing the character set, always verify the settings with SHOW VARIABLES. ✅ Use data export/import to prevent garbled characters. ✅ Consider index impact and recommend VARCHAR(191). ✅ Take performance into account and set appropriate indexes.
Finally
Changing MySQL character sets is not just a configuration tweak; it is a critical task that affects data integrity and performance.
By following proper settings and procedures, you can migrate to utf8mb4 safely and effectively. 🔹 Follow the steps in this article to apply the proper character set configuration! 🔹