問題描述
我正在使用 PHPUnit 來驗(yàn)證來自我的 PHP 代碼的 XML 輸出,但顯然我在字符編碼方面遇到了問題 MySQL 返回.這是我從 DOMDocument 得到的錯(cuò)誤:
I'm using PHPUnit to validate XML output from my PHP code, but apparently I have problems with the character encoding MySQL returns. Here is the error I get from DOMDocument:
Input is not proper UTF-8, indicate encoding!
Bytes: 0xE9 0x20 0x42 0x65
我初始化了 DOMDocument 以使其使用正確的編碼:
I initialize the DOMDocument so it uses the correct encoding:
$domDocument = new DOMDocument('1.0','UTF-8');
當(dāng)我使用 mb_detect_encoding 檢查 saveXML() 的輸出時(shí),結(jié)果是 UTF-8一>.
And when I check the output from saveXML() using mb_detect_encoding the result is UTF-8.
我還檢查了用于創(chuàng)建 XML 的所有調(diào)用,對遇到的所有 createCDATASection 參數(shù)使用 mb_detect_encoding,它們都是 UTF-8 或 ASCII(沒有純文本節(jié)點(diǎn),所有內(nèi)容都在 CDATA 塊).
I also checked all the calls used to create the XML, using mb_detect_encoding on all createCDATASection parameters encountered and they are all either UTF-8 or ASCII (there are no plain text nodes, everything is in CDATA blocks).
我認(rèn)為問題來自于使用é"字符(在 ISO 8859-1).將該字符添加到我的 XML 的行是:
I think the issue comes from the use of an 'é' character (which is 0xE9 in ISO 8859-1). The line which adds that character to my XML is:
$domDocument->createCDATASection($place->name);
和 mb_detect_encoding($place->name) 給我 UTF-8.
and mb_detect_encoding($place->name) gives me UTF-8.
數(shù)據(jù) ($place->name) 是從 MySQL 數(shù)據(jù)庫中提取的.此數(shù)據(jù)庫具有 UTF-8 字符集.
The data ($place->name) is pulled from a MySQL database. This database has the UTF-8 charset.
這是一些示例代碼:
$query = sprintf('SELECT name FROM place where id = 1');
$result = mysql_query($query);
$result = mysql_fetch_assoc($result);
// -- Feeding UTF-8 data directly WORKS
$domDocument = new DOMDocument('1.0','UTF-8');
$rootNode = $domDocument->createElement('Response');
$rootNode->appendChild($domDocument->createCDATASection('Café Belga'));
$domDocument->appendChild($rootNode);
$matcher = array('tag' => 'Response');
self::assertTag($matcher, $domDocument->saveXML(), '', FALSE);
// -- Feeding UTF-8 data from the resultset FAILS
$domDocument = new DOMDocument('1.0','UTF-8');
$rootNode = $domDocument->createElement('Response');
$rootNode->appendChild($domDocument->createCDATASection($result['name']));
$domDocument->appendChild($rootNode);
$matcher = array('tag' => 'Response');
self::assertTag($matcher, $domDocument->saveXML(), '', FALSE);
在我的 PHPStorm 調(diào)試器中,從數(shù)據(jù)庫中提取的字符串如下所示:
In my PHPStorm debugger, the string fetched from the database looks like this:
Café Belga
所以我認(rèn)為這是問題的根源.在 MySQLWorkbench 中,字符串是正確的:Café Belga.
So I think that is the root of the problem. In MySQLWorkbench the string is correct: Café Belga.
使用 utf8_encode($result['name'])
時(shí),一切正常!
When using utf8_encode($result['name'])
, however, everything works fine!
在手表窗口中再檢查一次:
One more check in the watches window:
mb_detect_encoding($result['name'])
-> "UTF-8"
mb_detect_encoding($result['name'])
-> "UTF-8"
mb_detect_encoding(utf8_encode($result['name']))
-> "UTF-8"
mb_detect_encoding(utf8_encode($result['name']))
-> "UTF-8"
順便說一句,是否有任何網(wǎng)站可以讓我簡單地復(fù)制粘貼這些十六進(jìn)制值并查看它們在不同字符集中應(yīng)該是什么字符?
On a side note, are there any sites where I can simply copy-paste those hex values and see what characters they are supposed to be in different character sets?
推薦答案
您必須將與數(shù)據(jù)庫的連接定義為 UTF-8:
You have to define the connection to your database as UTF-8:
// Set up your connection
$connection = mysql_connect('localhost', 'user', 'pw');
mysql_select_db('yourdb', $connection);
mysql_query("SET NAMES 'utf8'", $connection);
// Now you get UTF-8 encoded stuff
$query = sprintf('SELECT name FROM place where id = 1');
$result = mysql_query($query, $connection);
$result = mysql_fetch_assoc($result);
這篇關(guān)于如何讓 MySQL 返回 UTF-8?的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網(wǎng)!