目次
1. Introduction
MySQL is a widely used database management system employed in various applications. Among its settings, character set configuration is a crucial factor that directly impacts data integrity and performance. However, many developers encounter issues because they are unaware of the proper character set settings and how to verify them. In this article, we focus on how to check MySQL character set settings, covering the steps to change them, the differences between utf8 and utf8mb4, and measures to prevent garbled text. By reading this, you’ll acquire both fundamental knowledge and practical skills regarding MySQL character sets.2. Fundamentals of MySQL Character Sets and Collations
What is a character set
A character set is an encoding scheme that allows a computer to treat characters as digital data. For example, UTF-8 is a widely used character set that supports a broad range of languages worldwide. In MySQL, the default character sets are oftenutf8
or latin1
, but recently utf8mb4
is recommended.What is a collation
A collation defines the rules for comparing and sorting strings. For example,utf8_general_ci
and utf8_unicode_ci
are both collations for UTF-8, but utf8_unicode_ci
performs more accurate comparisons based on the Unicode standard.Relationship between character sets and collations
A character set defines how characters are encoded, and a collation defines the rules for comparing characters based on that encoding. Choosing them appropriately can prevent garbled text and performance degradation.3. How to Check the Current Character Set in MySQL
In MySQL, character sets are configured at the server level, database level, table level, and column level. Below, we introduce how to check each character set setting.Check the Server-Wide Character Set Settings
To check the server-level character set settings, run the following command.SHOW VARIABLES LIKE 'character_set_%';
The output of this command looks like this:+--------------------------+------------------+
| Variable_name | Value |
+--------------------------+------------------+
| character_set_client | utf8mb4 |
| character_set_connection | utf8mb4 |
| character_set_database | utf8mb4 |
| character_set_results | utf8mb4 |
| character_set_server | utf8mb4 |
| character_set_system | utf8 |
+--------------------------+------------------+
Meaning of each item:character_set_server
: The default server character set.character_set_database
: The default database character set.
Check the Character Set for a Specific Database
To check the character set setting of a specific database, use the following command.SHOW CREATE DATABASE database_name;
Example output:CREATE DATABASE `database_name` /*!40100 DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci */
Check the Character Set at the Table and Column Level
To check the character set of tables and columns, use the following commands. Table level:SHOW CREATE TABLE table_name;
Column level:SHOW FULL COLUMNS FROM table_name;
Example output:+----------------+--------------+------------------+-------+
| Field | Type | Collation | Null |
+----------------+--------------+------------------+-------+
| column_name | varchar(255) | utf8mb4_unicode_ci | YES |
+----------------+--------------+------------------+-------+
By using these commands, you can verify that the character set is configured correctly.4. How to Set and Change Character Encoding
The way to change character encoding in MySQL varies by server-wide, database-level, table-level, and column-level. We’ll explain each configuration method in detail.Changing Server-Wide Settings
To change the default character set at the server level, edit MySQL’s configuration file (usuallymy.cnf
or my.ini
). Steps:- Open the configuration file.
sudo nano /etc/my.cnf
- Add or edit the following settings.
[mysqld]
character-set-server=utf8mb4
collation-server=utf8mb4_unicode_ci
- Restart the MySQL server.
sudo systemctl restart mysqld
Changing Database-Level Settings
To change the character set of a specific database, use the following command. Change command:ALTER DATABASE database_name CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
Example:ALTER DATABASE my_database CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
This command changes the database’s default character set but does not affect existing tables or data. If you need to change at the table level, see the next section.Changing Table-Level Settings
To change the character set of an existing table, use the following command. Change command:ALTER TABLE table_name CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
Example:ALTER TABLE users CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
This command also changes the character set of all columns in the table.Changing Column-Level Settings
To change the character set for a specific column only, do the following. Change command:ALTER TABLE table_name MODIFY column_name column_type CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
Example:ALTER TABLE users MODIFY username VARCHAR(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
5. Difference between utf8 and utf8mb4
Technical differences
- utf8: MySQL’s utf8 is actually a subset of UTF-8 and only supports up to 3 bytes per character. As a result, it cannot handle emojis or certain special characters (e.g., 𠮷).
- utf8mb4: A full implementation of UTF-8 that supports characters up to 4 bytes.
Why utf8mb4 is recommended
- Compatibility: Modern web and mobile applications increasingly need to handle emojis and special characters.
- Standardization: Many CMSs (e.g., WordPress) recommend utf8mb4 by default.
Things to watch out for when migrating
When migrating from utf8 to utf8mb4, please keep the following points in mind:- Database size: utf8mb4 uses up to 4 bytes per character, so the database size may increase.
- Existing data: It is strongly recommended to back up your data before making changes.
- Application settings: You also need to configure the character set used by the application (e.g., client connection character set) to utf8mb4.
6. Causes and Solutions for Garbled Text
Main Causes of Garbled Text
- Mismatched character encoding between client and server
- Example: the client uses
latin1
while the server usesutf8mb4
.
- Improper data migration
- When the character encoding is not set correctly during data import.
- Application configuration errors
- When the character set specified for the database connection is inappropriate.
Specific Measures to Prevent Garbled Text
- Verify and Standardize Server Settings
- Check the server’s character set configuration and maintain consistency at all levels.
SHOW VARIABLES LIKE 'character_set_%';
- Adjust Client Settings
- Explicitly specify the character set when connecting the client.
SET NAMES utf8mb4;
- Precautions During Data Migration
- Specify the correct character set during import.
mysql --default-character-set=utf8mb4 -u username -p database_name < dump.sql

7. FAQ Section
Frequently Asked Questions
- Does changing to utf8mb4 affect performance?
- Because utf8mb4 increases data size, very large databases may see a slight performance impact. However, it is rare for this to be an issue in normal operation.
- Are there any risks when migrating from utf8 to utf8mb4?
- The migration process itself carries no risk, but it is important to back up beforehand to prevent data loss or application issues that could arise from converting existing data.
- What changes when you alter the collation?
- It enables more accurate string comparisons and sorting. In particular, for multilingual applications, we recommend
utf8mb4_unicode_ci
.