問題描述
MySQL 中的 utf8mb4
和 utf8
字符集有什么區(qū)別?
What is the difference between utf8mb4
and utf8
charsets in MySQL?
我已經(jīng)了解 ASCII、UTF-8、UTF-16 和 UTF-32 編碼;但我很想知道 utf8mb4
組編碼與 MySQL 服務(wù)器 中定義的其他編碼類型有什么區(qū)別.
I already know about ASCII, UTF-8, UTF-16 and UTF-32 encodings;
but I'm curious to know whats the difference of utf8mb4
group of encodings with other encoding types defined in MySQL Server.
使用 utf8mb4
而不是 utf8
有什么特別的好處/建議嗎?
Are there any special benefits/proposes of using utf8mb4
rather than utf8
?
推薦答案
UTF-8 是一種變長(zhǎng)編碼.對(duì)于 UTF-8,這意味著存儲(chǔ)一個(gè)代碼點(diǎn)需要一到四個(gè)字節(jié).但是,MySQL 的編碼稱為utf8".(utf8mb3"的別名)每個(gè)代碼點(diǎn)最多只能存儲(chǔ)三個(gè)字節(jié).
UTF-8 is a variable-length encoding. In the case of UTF-8, this means that storing one code point requires one to four bytes. However, MySQL's encoding called "utf8" (alias of "utf8mb3") only stores a maximum of three bytes per code point.
所以字符集utf8"/utf8mb3"無法存儲(chǔ)所有 Unicode 代碼點(diǎn):它僅支持 0x000 到 0xFFFF 范圍,這稱為基本多語言平面".另請(qǐng)參閱Unicode 編碼的比較.
So the character set "utf8"/"utf8mb3" cannot store all Unicode code points: it only supports the range 0x000 to 0xFFFF, which is called the "Basic Multilingual Plane". See also Comparison of Unicode encodings.
這就是(同一頁面的先前版本)MySQL 文檔 不得不說:
This is what (a previous version of the same page at) the MySQL documentation has to say about it:
名為 utf8[/utf8mb3] 的字符集每個(gè)字符最多使用三個(gè)字節(jié),并且只包含 BMP 字符.從 MySQL 5.5.3 開始,utf8mb4 字符集每個(gè)字符最多使用四個(gè)字節(jié),支持補(bǔ)充字符:
The character set named utf8[/utf8mb3] uses a maximum of three bytes per character and contains only BMP characters. As of MySQL 5.5.3, the utf8mb4 character set uses a maximum of four bytes per character supports supplemental characters:
對(duì)于 BMP 字符,utf8[/utf8mb3] 和 utf8mb4 具有相同的存儲(chǔ)特性:相同的代碼值、相同的編碼、相同的長(zhǎng)度.
For a BMP character, utf8[/utf8mb3] and utf8mb4 have identical storage characteristics: same code values, same encoding, same length.
對(duì)于增補(bǔ)字符,utf8[/utf8mb3]根本無法存儲(chǔ)該字符,而utf8mb4需要四個(gè)字節(jié)來存儲(chǔ).由于 utf8[/utf8mb3] 根本無法存儲(chǔ)字符,因此您在 utf8[/utf8mb3] 列中沒有任何補(bǔ)充字符,并且從舊版本的 utf8[/utf8mb3] 數(shù)據(jù)升級(jí)時(shí)無需擔(dān)心轉(zhuǎn)換字符或丟失數(shù)據(jù)MySQL.
For a supplementary character, utf8[/utf8mb3] cannot store the character at all, while utf8mb4 requires four bytes to store it. Since utf8[/utf8mb3] cannot store the character at all, you do not have any supplementary characters in utf8[/utf8mb3] columns and you need not worry about converting characters or losing data when upgrading utf8[/utf8mb3] data from older versions of MySQL.
因此,如果您希望您的列支持存儲(chǔ)位于 BMP 之外的字符(并且您通常希望如此),例如 表情符號(hào),使用utf8mb4".另請(qǐng)參閱最常見的非-BMP Unicode 字符在實(shí)際使用中?.
So if you want your column to support storing characters lying outside the BMP (and you usually want to), such as emoji, use "utf8mb4". See also What are the most common non-BMP Unicode characters in actual use?.
這篇關(guān)于MySQL 中的 utf8mb4 和 utf8 字符集有什么區(qū)別?的文章就介紹到這了,希望我們推薦的答案對(duì)大家有所幫助,也希望大家多多支持html5模板網(wǎng)!