Many believe MySQL’s "UTF-8" supports all Unicode characters, but that’s not the whole story. MySQL’s utf8 can only store characters up to 3 bytes, meaning certain emojis and special characters won’t work. This article outlines the key differences between MySQL’s utf8 and utf8mb4 and why utf8mb4 is the superior choice for modern applications.
UTF-8 vs. utf8mb4 in MySQL
When working with MySQL, you’ll encounter both utf8 and utf8mb4, but only one provides full Unicode support. Here’s how they differ:
UTF-8 (utf8mb3): Supports 1 to 3 bytes per character. It cannot store modern emojis or 4-byte Unicode symbols.
utf8mb4: Supports up to 4 bytes per character, which is essential for modern web apps that use emojis or supplementary Unicode characters.
The need for utf8mb4
Originally, MySQL used utf8 (utf8mb3) as the standard, but it didn’t support 4-byte characters. If you try to insert a 4-byte character into a utf8 column, you’ll encounter errors like this:
Incorrect string value: ‘\\x77\\xD0’ for column ‘column_name_here’ at row 1
The shift to utf8mb4 resolved this issue, offering support for emojis and full Unicode compatibility.
Types of utf8mb4 collations
utf8mb4_general_ci: A fast, general-purpose collation.
utf8mb4_unicode_ci: Designed for Unicode standard sorting, useful for international apps.
FAQ
What is UTF-8?
UTF-8 encodes Unicode characters into binary data. It’s widely used in databases and modern web applications.
Why should I use utf8mb4 in MySQL?
utf8mb4 supports full Unicode characters (up to 4 bytes), unlike MySQL’s older utf8, which only supports 3 bytes. This allows you to store emojis and special characters.
Are utf8 and utf8mb4 the same?
No. utf8mb4 supports 4-byte characters, while utf8 (utf8mb3) only supports 3 bytes.
How do I create a MySQL table with utf8mb4?
Run the following SQL to set utf8mb4 as the character set for a table:
CREATE TABLE your_table_name (
column_name VARCHAR(100)
) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
Conclusion
If you need to store modern Unicode characters, including emojis, you’ll want to use utf8mb4 in MySQL. It fixes the limitations of utf8 and prevents data errors. For more details, check out the article MySQL’s UTF-8: Is It Real?