問題描述
我可以在我的腳本中很好地使用 UTF-8 字符.
I am able use UTF-8 characters just fine in my scripts.
事實(shí)上,變量名和函數(shù)名可以包含Unicode字符一>.
還有 mb_string extension 處理多字節(jié)字符串,但在無數(shù)文章中 PHP 是因其缺乏 Unicode 支持而受到批評.
There is also the mb_string extension which deals with multi-byte strings, yet in countless articles PHP is criticized for its lack of Unicode support.
我不明白;為什么說PHP不支持Unicode???p>
I don't get it; why is PHP said to not support Unicode?
推薦答案
當(dāng) PHP 在幾年前開始時,UTF-8 并沒有得到真正的支持.我們談?wù)摰氖?Windows 98/Me 等非 Unicode 操作系統(tǒng)仍然流行的時代,而 Delphi 等其他大型語言也是非 Unicode 的時代.并非所有語言從第一天起就考慮到 Unicode,并且在不破壞很多東西的情況下將您的語言完全更改為 Unicode 是很困難的.例如,Delphi 在一兩年前才兼容 Unicode,而 Java 或 C# 等其他語言從第一天起就采用 Unicode 設(shè)計(jì).
When PHP was started several years ago, UTF-8 was not really supported. We are talking about a time when non-Unicode OS like Windows 98/Me was still current and when other big languages like Delphi were also non-Unicode. Not all languages were designed with Unicode in mind from day 1, and completely changing your language to Unicode without breaking a lot of stuff is hard. Delphi only became Unicode compatible a year or two ago for example, while other languages like Java or C# were designed in Unicode from Day 1.
因此,當(dāng) PHP 發(fā)展成為 PHP 3、PHP 4 和現(xiàn)在的 PHP 5 時,根本沒有人決定添加 Unicode.為什么?大概是為了與現(xiàn)有腳本保持兼容,或者因?yàn)?utf8_de/encode 和 mb_string 已經(jīng)存在并且可以工作.我不確定,但我堅(jiān)信這與有機(jī)增長有關(guān).特性并不是簡單地默認(rèn)存在,它們必須由某人編寫,而這在 PHP 中還沒有發(fā)生.
So when PHP grew and became PHP 3, PHP 4 and now PHP 5, simply no one decided to add Unicode. Why? Presumably to keep compatible with existing scripts or because utf8_de/encode and mb_string already existed and work. I do not know for sure, but I strongly believe that it has something to do with organic growth. Features do not simply exist by default, they have to be written by someone, and that simply did not happen for PHP yet.
好的,我讀錯了問題.問題是:字符串是如何在內(nèi)部存儲的?如果我輸入W?hrung"或écriture",哪個編碼用于創(chuàng)建使用的字節(jié)?在 PHP 的情況下,它是帶有代碼頁的 ASCII.這意味著:如果我使用 ISO-8859-15 對字符串進(jìn)行編碼,然后使用一些中文代碼頁對其進(jìn)行解碼,則會得到奇怪的結(jié)果.另一種選擇是在 C# 或 Java 等語言中,所有內(nèi)容都存儲為 Unicode,這意味著:不再有代碼頁,理論上你不會搞砸.我推薦 Joel 的文章關(guān)于 Unicode 和字符集,但本質(zhì)上它歸結(jié)為:字符串存儲在內(nèi)部,而 PHP 的答案是Not in Unicode",這意味著在處理字符串時必須非常小心和明確,以確保在輸入、存儲(數(shù)據(jù)庫)和輸出,這很容易出錯.
Ok, I read the question wrong. The question is: How are strings stored internally? If I type in "W?hrung" or "écriture", which Encoding is used to create the bytes used? In case of PHP, it is ASCII with a Codepage. That means: If I encode the string using ISO-8859-15 and you decode it with some chinese codepage, you will get weird results. The alternative is in languages like C# or Java where everything is stored as Unicode, which means: There is no codepage anymore, and theoretically you cannot mess up. I recommend Joel's article about Unicode and Character Sets, but essentially it boils down to: How are strings stored internally, and the answer with PHP is "Not in Unicode", which means that you have to be very careful and explicit when processing strings to make sure to always keep the string in the proper encoding during input, storage (database) and output, which is very errorprone.
這篇關(guān)于什么因素使 PHP Unicode 不兼容?的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網(wǎng)!