問題描述
我在 IIS7 上運行一個系統.頁面 META 標簽的編碼為 UTF-8,根據 Chrome 菜單,實際編碼看起來是相同的.
I'm running a system on IIS7. The page META tag has the encoding as UTF-8, and the real encoding would appear to be the same according to the Chrome menu.
當我上傳名稱中帶有長連字符"(–")的文件時,它會被轉換為垃圾字符(–").
When I upload a file with a "long hyphen" in its name ("–") it gets converted to junk characters ("a€"").
垃圾字符保存在MySQL中,服務器上文件的文件名也有垃圾字符.但是,當我從數據庫中提取文件名并用 PHP 顯示時,它會顯示正確的連字符.
The junk characters are saved in MySQL and the file name of the file on the server also has the junk characters. However when I pull the file name from the database and display it with PHP, it displays with the correct hyphen.
有沒有辦法將文件名存儲為 UTF-8?當我嘗試此代碼時,出現錯誤:
Is there any way to have the file name stored as UTF-8? When I try this code I get an error:
$fn = iconv("CP-1252", "UTF-8", $file['name']);
debug($fn);
Notice (8): iconv(): Wrong charset, conversion from `CP-1252' to `UTF-8' is not allowed
--
幾個月后更新!所以這個問題與 Windows 上的 PHP 錯誤有關:http://bugs.php.net/bug.php?id=47096
Update several months later! So this problem is related to a PHP bug on Windows: http://bugs.php.net/bug.php?id=47096
Unicode 字符在 move_upload_file 上被 PHP 破壞 - 盡管我也看到了重命名和 ZipArchive 的問題,所以我認為這是 PHP 和 Windows 的普遍問題.
Unicode characters get mangled by PHP on move_upload_file - although I have also seen the issue with rename and ZipArchive so I think it's a general issue with PHP and Windows.
我從 Wordpress 找到了一個解決方法 此處.我必須使用損壞的文件名存儲文件,然后在下載/電子郵件/顯示時對其進行清理.
I have adapted a workaround from Wordpress found here. I have to store the file with the mangled file name and then sanitize it on download/email/display.
以下是我正在使用的改編方法,以防將來對某人有用.如果您在下載/通過電子郵件發送之前嘗試壓縮文件,或者您需要將文件寫入網絡共享,這仍然沒有多大用處.
Here are the adapted methods I'm using in case it's of use to someone in future. This still isn't much use if you're trying to zip files before downloading/emailing or you need to write the files to a network share.
public static function sanitizeFilename($filename, $utf8 = true)
{
if ( self::seems_utf8($filename) == $utf8 )
return $filename;
// On Windows platforms, PHP will mangle non-ASCII characters, see http://bugs.php.net/bug.php?id=47096
if ( 'WIN' == substr( PHP_OS, 0, 3 ) ) {
if(setlocale( LC_CTYPE, 0 )=='C'){ // Locale has not been set and the default is being used, according to answer by Colin Morelli at http://stackoverflow.com/questions/13788415/how-to-retrieve-the-current-windows-codepage-in-php
// thus, we force the locale to be explicitly set to the default system locale
$codepage = 'Windows-' . trim( strstr( setlocale( LC_CTYPE, '' ), '.' ), '.' );
}
else {
$codepage = 'Windows-' . trim( strstr( setlocale( LC_CTYPE, 0 ), '.' ), '.' );
}
$charset = 'UTF-8';
if ( function_exists( 'iconv' ) ) {
if ( false == $utf8 ){
$filename = iconv( $charset, $codepage . '//IGNORE', $filename );
}
else {
$filename = iconv( $codepage, $charset, $filename );
}
} elseif ( function_exists( 'mb_convert_encoding' ) ) {
if ( false == $utf8 )
$filename = mb_convert_encoding( $filename, $codepage, $charset );
else
$filename = mb_convert_encoding( $filename, $charset, $codepage );
}
}
return $filename;
}
public static function seems_utf8($str) {
$length = strlen($str);
for ($i=0; $i < $length; $i++) {
$c = ord($str[$i]);
if ($c < 0x80) $n = 0; # 0bbbbbbb
elseif (($c & 0xE0) == 0xC0) $n=1; # 110bbbbb
elseif (($c & 0xF0) == 0xE0) $n=2; # 1110bbbb
elseif (($c & 0xF8) == 0xF0) $n=3; # 11110bbb
elseif (($c & 0xFC) == 0xF8) $n=4; # 111110bb
elseif (($c & 0xFE) == 0xFC) $n=5; # 1111110b
else return false; # Does not match any model
for ($j=0; $j<$n; $j++) { # n bytes matching 10bbbbbb follow ?
if ((++$i == $length) || ((ord($str[$i]) & 0xC0) != 0x80))
return false;
}
}
return true;
}
推薦答案
UPDATE實際上,這是 Windows 上的 PHP 錯誤.有如下解決方法,但我見過的最佳解決方案是使用 WFIO 擴展.這個擴展為文件流提供了一個新的協議 wfio://
并允許 PHP 正確處理 Windows 文件系統上的 UTF-8 字符.wfio://
支持多??種PHP函數,包括fopen、scandir、mkdir、copy、rename等
UPDATE
Indeed this is a PHP bug on Windows. There are workarounds like below, but the best solution I have seen is to use the WFIO extension. This extension provides a new protocol wfio://
for file streams and allows PHP to properly handle UTF-8 characters on the Windows file-system. wfio://
supports a number of PHP functions including fopen, scandir, mkdir, copy, rename, etc.
原始解決方案
所以這個問題與 Windows 上的 PHP 錯誤有關:http://bugs.php.net/bug.php?id=47096
So this problem is related to a PHP bug on Windows: http://bugs.php.net/bug.php?id=47096
Unicode 字符在 move_upload_file 上被 PHP 破壞 - 盡管我也看到了重命名和 ZipArchive 的問題,所以我認為這是 PHP 和 Windows 的普遍問題.
Unicode characters get mangled by PHP on move_upload_file - although I have also seen the issue with rename and ZipArchive so I think it's a general issue with PHP and Windows.
我從 Wordpress 找到了一個解決方法 此處.我必須使用損壞的文件名存儲文件,然后在下載/電子郵件/顯示時對其進行清理.
I have adapted a workaround from Wordpress found here. I have to store the file with the mangled file name and then sanitize it on download/email/display.
以下是我正在使用的改編方法,以防將來對某人有用.如果您在下載/通過電子郵件發送之前嘗試壓縮文件,或者您需要將文件寫入網絡共享,這仍然沒有多大用處.
Here are the adapted methods I'm using in case it's of use to someone in future. This still isn't much use if you're trying to zip files before downloading/emailing or you need to write the files to a network share.
public static function sanitizeFilename($filename, $utf8 = true)
{
if ( self::seems_utf8($filename) == $utf8 )
return $filename;
// On Windows platforms, PHP will mangle non-ASCII characters, see http://bugs.php.net/bug.php?id=47096
if ( 'WIN' == substr( PHP_OS, 0, 3 ) ) {
if(setlocale( LC_CTYPE, 0 )=='C'){ // Locale has not been set and the default is being used, according to answer by Colin Morelli at http://stackoverflow.com/questions/13788415/how-to-retrieve-the-current-windows-codepage-in-php
// thus, we force the locale to be explicitly set to the default system locale
$codepage = 'Windows-' . trim( strstr( setlocale( LC_CTYPE, '' ), '.' ), '.' );
}
else {
$codepage = 'Windows-' . trim( strstr( setlocale( LC_CTYPE, 0 ), '.' ), '.' );
}
$charset = 'UTF-8';
if ( function_exists( 'iconv' ) ) {
if ( false == $utf8 ){
$filename = iconv( $charset, $codepage . '//IGNORE', $filename );
}
else {
$filename = iconv( $codepage, $charset, $filename );
}
} elseif ( function_exists( 'mb_convert_encoding' ) ) {
if ( false == $utf8 )
$filename = mb_convert_encoding( $filename, $codepage, $charset );
else
$filename = mb_convert_encoding( $filename, $charset, $codepage );
}
}
return $filename;
}
public static function seems_utf8($str) {
$length = strlen($str);
for ($i=0; $i < $length; $i++) {
$c = ord($str[$i]);
if ($c < 0x80) $n = 0; # 0bbbbbbb
elseif (($c & 0xE0) == 0xC0) $n=1; # 110bbbbb
elseif (($c & 0xF0) == 0xE0) $n=2; # 1110bbbb
elseif (($c & 0xF8) == 0xF0) $n=3; # 11110bbb
elseif (($c & 0xFC) == 0xF8) $n=4; # 111110bb
elseif (($c & 0xFE) == 0xFC) $n=5; # 1111110b
else return false; # Does not match any model
for ($j=0; $j<$n; $j++) { # n bytes matching 10bbbbbb follow ?
if ((++$i == $length) || ((ord($str[$i]) & 0xC0) != 0x80))
return false;
}
}
return true;
}
這篇關于上傳文件名中的UTF-8字符在文件上傳時混亂的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網!