問題描述
我有一些 json 我需要解碼、更改然后編碼,而不會(huì)弄亂任何字符.
I have some json I need to decode, alter and then encode without messing up any characters.
如果我在 json 字符串中有一個(gè) unicode 字符,它將無法解碼.我不知道為什么因?yàn)?json.org 說一個(gè)字符串可以包含:any-Unicode-character-except-"-or--or- control-character
.但它在要么是蟒蛇.
If I have a unicode character in a json string it will not decode. I'm not sure why since json.org says a string can contain: any-Unicode-character- except-"-or--or- control-character
. But it doesn't work in python either.
{"Tag":"Odómetro"}
我可以使用 utf8_encode 這將允許使用 json_decode 對字符串進(jìn)行解碼,但是該字符會(huì)被破壞成其他東西.這是結(jié)果數(shù)組的 print_r 的結(jié)果.兩個(gè)字符.
I can use utf8_encode which will allow the string to be decoded with json_decode, however the character gets mangled into something else. This is the result from a print_r of the result array. Two characters.
[Tag] => Od?3metro
當(dāng)我再次對數(shù)組進(jìn)行編碼時(shí),字符轉(zhuǎn)義為 ascii,根據(jù) json 規(guī)范這是正確的:
When I encode the array again I the character escaped to ascii, which is correct according to the json spec:
"Tag"=>"Odu00f3metro"
有什么辦法可以解除這種情況嗎?json_encode 沒有提供這樣的選項(xiàng),utf8_encode 似乎也不起作用.
Is there some way I can un-escape this? json_encode gives no such option, utf8_encode does not seem to work either.
編輯 我看到 json_encode 有一個(gè) unescaped_unicode 選項(xiàng).但是,它沒有按預(yù)期工作.哦,該死的,它僅適用于 php 5.4.我將不得不使用一些正則表達(dá)式,因?yàn)槲抑挥?5.3.
Edit I see there is an unescaped_unicode option for json_encode. However it's not working as expected. Oh damn, it's only on php 5.4. I will have to use some regex as I only have 5.3.
$json = json_encode($array, JSON_UNESCAPED_UNICODE);
Warning: json_encode() expects parameter 2 to be long, string ...
推薦答案
從你所說的一切來看,你正在處理的原始 Odómetro
字符串似乎是用 ISO 8859 編碼的-1,不是UTF-8.
Judging from everything you've said, it seems like the original Odómetro
string you're dealing with is encoded with ISO 8859-1, not UTF-8.
這就是我這么認(rèn)為的原因:
Here's why I think so:
json_encode
在您通過utf8_encode
運(yùn)行輸入字符串后生成可解析的輸出,該字符串從 ISO 8859-1 轉(zhuǎn)換為 UTF-8.- 你確實(shí)說過在執(zhí)行
utf8_encode
之后使用print_r
時(shí)你得到了錯(cuò)位"的輸出,但你得到的錯(cuò)位輸出實(shí)際上正是嘗試解析會(huì)發(fā)生的情況作為 ISO 8859-1 的 UTF-8 文本(在 UTF-8 中 ó 是x63xb3
,但在 ISO 8859-1 中該序列是?3
. - 您的
htmlentities
hackaround 解決方案奏效了.htmlentities
需要知道輸入字符串的編碼才能正常工作.如果您不指定,則假定為 ISO 8859-1.(html_entity_decode
,令人困惑的是,默認(rèn)為 UTF-8,因此您的方法具有從 ISO 8859-1 轉(zhuǎn)換為 UTF-8 的效果.) - 您說您在 Python 中遇到了同樣的問題,這似乎將 PHP 排除在問題之外.
json_encode
produced parseable output after you ran the input string throughutf8_encode
, which converts from ISO 8859-1 to UTF-8.- You did say that you got "mangled" output when using
print_r
after doingutf8_encode
, but the mangled output you got is actually exactly what would happen by trying to parse UTF-8 text as ISO 8859-1 (ó isx63xb3
in UTF-8, but that sequence is?3
in ISO 8859-1. - Your
htmlentities
hackaround solution worked.htmlentities
needs to know what the encoding of the input string to work correctly. If you don't specify one, it assumes ISO 8859-1. (html_entity_decode
, confusingly, defaults to UTF-8, so your method had the effect of converting from ISO 8859-1 to UTF-8.) - You said you had the same problem in Python, which would seem to exclude PHP from being the issue.
PHP 將使用 uXXXX
轉(zhuǎn)義,但正如您所指出的,這是有效的 JSON.
PHP will use the uXXXX
escaping, but as you noted, this is valid JSON.
因此,您似乎需要配置與 Postgres 的連接,以便它為您提供 UTF-8 字符串.PHP 手冊表明您可以通過將 options='--client_encoding=UTF8'
附加到連接字符串來執(zhí)行此操作.當(dāng)前存儲在數(shù)據(jù)庫中的數(shù)據(jù)也有可能采用錯(cuò)誤的編碼.(您可以簡單地使用 utf8_encode
,但這將僅支持屬于 ISO 8859-1 的字符.
So, it seems like you need to configure your connection to Postgres so that it will give you UTF-8 strings. The PHP manual indicates you'd do this by appending options='--client_encoding=UTF8'
to the connection string. There's also the possibility that the data currently stored in the database is in the wrong encoding. (You could simply use utf8_encode
, but this will only support characters that are part of ISO 8859-1).
最后,正如另一個(gè)答案所指出的,您確實(shí)需要確保使用 HTTP 標(biāo)頭或其他方式聲明正確的字符集(當(dāng)然,這個(gè)特定問題可能只是您所做的環(huán)境的產(chǎn)物您的 print_r
測試).
Finally, as another answer noted, you do need to make sure that you're declaring the proper charset, with an HTTP header or otherwise (of course, this particular issue might have just been an artifact of the environment where you did your print_r
testing).
這篇關(guān)于PHP用unicode字符解碼和編碼json的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網(wǎng)!