久久久久久久av_日韩在线中文_看一级毛片视频_日本精品二区_成人深夜福利视频_武道仙尊动漫在线观看

  • <legend id='8xouy'><style id='8xouy'><dir id='8xouy'><q id='8xouy'></q></dir></style></legend>

        <i id='8xouy'><tr id='8xouy'><dt id='8xouy'><q id='8xouy'><span id='8xouy'><b id='8xouy'><form id='8xouy'><ins id='8xouy'></ins><ul id='8xouy'></ul><sub id='8xouy'></sub></form><legend id='8xouy'></legend><bdo id='8xouy'><pre id='8xouy'><center id='8xouy'></center></pre></bdo></b><th id='8xouy'></th></span></q></dt></tr></i><div class="qwawimqqmiuu" id='8xouy'><tfoot id='8xouy'></tfoot><dl id='8xouy'><fieldset id='8xouy'></fieldset></dl></div>
        <tfoot id='8xouy'></tfoot>
      1. <small id='8xouy'></small><noframes id='8xouy'>

          <bdo id='8xouy'></bdo><ul id='8xouy'></ul>

        用問號替換無效的 UTF-8 字符,mbstring.substitute_c

        Replacing invalid UTF-8 characters by question marks, mbstring.substitute_character seems ignored(用問號替換無效的 UTF-8 字符,mbstring.substitute_character 似乎被忽略了)
        <i id='aGPRP'><tr id='aGPRP'><dt id='aGPRP'><q id='aGPRP'><span id='aGPRP'><b id='aGPRP'><form id='aGPRP'><ins id='aGPRP'></ins><ul id='aGPRP'></ul><sub id='aGPRP'></sub></form><legend id='aGPRP'></legend><bdo id='aGPRP'><pre id='aGPRP'><center id='aGPRP'></center></pre></bdo></b><th id='aGPRP'></th></span></q></dt></tr></i><div class="qwawimqqmiuu" id='aGPRP'><tfoot id='aGPRP'></tfoot><dl id='aGPRP'><fieldset id='aGPRP'></fieldset></dl></div>

          • <bdo id='aGPRP'></bdo><ul id='aGPRP'></ul>
              1. <small id='aGPRP'></small><noframes id='aGPRP'>

                <tfoot id='aGPRP'></tfoot>

                    <tbody id='aGPRP'></tbody>
                  <legend id='aGPRP'><style id='aGPRP'><dir id='aGPRP'><q id='aGPRP'></q></dir></style></legend>

                  本文介紹了用問號替換無效的 UTF-8 字符,mbstring.substitute_character 似乎被忽略了的處理方法,對大家解決問題具有一定的參考價值,需要的朋友們下面隨著小編來一起學習吧!

                  問題描述

                  我想用引號 (PHP 5.3.5) 替換無效的 UTF-8 字符.

                  I would like to replace invalid UTF-8 chars with quotation marks (PHP 5.3.5).

                  到目前為止我有這個解決方案,但無效字符被刪除,而不是被?"替換.

                  So far I have this solution, but invalid characters are removed, instead of being replaced by '?'.

                  function replace_invalid_utf8($str)
                  {
                    return mb_convert_encoding($str, 'UTF-8', 'UTF-8');
                  }
                  
                  echo mb_substitute_character()."
                  ";
                  
                  echo replace_invalid_utf8('éééaaaàààee??')."
                  ";
                  echo replace_invalid_utf8('eeeaaaaaaee??')."
                  ";
                  

                  應該輸出:

                  63 // ASCII code for '?' character
                  ???aaa???eé // or ??aa??eé
                  eeeaaaaaaeeé
                  

                  但目前輸出:

                  63
                  aaaee // removed invalid characters
                  eeeaaaaaaeeé
                  

                  有什么建議嗎?

                  你會用另一種方式來做嗎(例如使用 preg_replace()?)

                  Would you do it another way (using a preg_replace() for example?)

                  謝謝.

                  推薦答案

                  您可以使用mb_convert_encoding()htmlspecialchars()ENT_SUBSTITUTE> 自 PHP 5.4 起的選項.當然,您也可以使用 preg_match().如果您使用 intl,則可以使用 UConverter 自 PHP 5.5 起.

                  You can use mb_convert_encoding() or htmlspecialchars()'s ENT_SUBSTITUTE option since PHP 5.4. Of cource you can use preg_match() too. If you use intl, you can use UConverter since PHP 5.5.

                  無效字節序列的推薦替代字符是U+FFFD.參見3.1.2 替換格式錯誤的子序列";在 UTR #36:Unicode 安全注意事項中的詳細信息.

                  Recommended substitute character for invalid byte sequence is U+FFFD. see "3.1.2 Substituting for Ill-Formed Subsequences" in UTR #36: Unicode Security Considerations for the details.

                  使用 mb_convert_encoding() 時,您可以通過將 Unicode 代碼點傳遞給 mb_substitute_character()mbstring.substitute_character 指令來指定替換字符.替換的默認字符是?(問號 - U+003F).

                  When using mb_convert_encoding(), you can specify a substitute character by passing Unicode code point to mb_substitute_character() or mbstring.substitute_character directive. The default character for substitution is ? (QUESTION MARK - U+003F).

                  // REPLACEMENT CHARACTER (U+FFFD)
                  mb_substitute_character(0xFFFD);
                  
                  function replace_invalid_byte_sequence($str)
                  {
                      return mb_convert_encoding($str, 'UTF-8', 'UTF-8');
                  }
                  
                  function replace_invalid_byte_sequence2($str)
                  {
                      return htmlspecialchars_decode(htmlspecialchars($str, ENT_SUBSTITUTE, 'UTF-8'));
                  }
                  

                  UConverter 提供面向過程和面向對象的 API.

                  UConverter offers both procedual and object-oriented API.

                  function replace_invalid_byte_sequence3($str)
                  {
                      return UConverter::transcode($str, 'UTF-8', 'UTF-8');
                  }
                  
                  function replace_invalid_byte_sequence4($str)
                  {
                      return (new UConverter('UTF-8', 'UTF-8'))->convert($str);
                  }
                  

                  使用preg_match()時,需要注意字節范圍,避免UTF-8非最短格式的漏洞.尾字節的范圍根據前導字節的范圍而變化.

                  When using preg_match(), you need pay attention to the range of bytes for avoiding the vulnerability of UTF-8 non-shortest form. the range of trail bytes change depending on the range of lead bytes.

                  lead byte: 0x00 - 0x7F, 0xC2 - 0xF4
                  trail byte: 0x80(or 0x90 or 0xA0) - 0xBF(or 0x8F)
                  

                  您可以參考以下資源來檢查字節范圍.

                  you can refer to the following resources for checking the byte range.

                  1. "UTF-8 字節序列的語法"在 RFC 3629 中
                  2. "表 3-7.格式良好的 UTF-8 字節序列"在 Unicode 標準 6.1 中
                  3. "多語言表單編碼"在 W3C 國際化中"
                  1. "Syntax of UTF-8 Byte Sequences" in RFC 3629
                  2. "Table 3-7. Well-Formed UTF-8 Byte Sequences" in the Unicode Standard 6.1
                  3. "Multilingual form encoding" in W3C Internationalization"

                  字節范圍表如下.

                        Code Points    First Byte Second Byte Third Byte Fourth Byte
                    U+0000 -   U+007F   00 - 7F
                    U+0080 -   U+07FF   C2 - DF    80 - BF
                    U+0800 -   U+0FFF   E0         A0 - BF     80 - BF
                    U+1000 -   U+CFFF   E1 - EC    80 - BF     80 - BF
                    U+D000 -   U+D7FF   ED         80 - 9F     80 - BF
                    U+E000 -   U+FFFF   EE - EF    80 - BF     80 - BF
                   U+10000 -  U+3FFFF   F0         90 - BF     80 - BF    80 - BF
                   U+40000 -  U+FFFFF   F1 - F3    80 - BF     80 - BF    80 - BF
                  U+100000 - U+10FFFF   F4         80 - 8F     80 - BF    80 - BF
                  

                  如何在不破壞有效字符的情況下替換無效字節序列見"3.1.1 格式錯誤的子序列"在 UTR #36:Unicode 安全注意事項和表 3-8.U+FFFD在UTF-8轉換中的使用"在 Unicode 標準中.

                  How to replace invalid byte sequence without breaking valid characters is shown in "3.1.1 Ill-Formed Subsequences" in UTR #36: Unicode Security Considerations and "Table 3-8. Use of U+FFFD in UTF-8 Conversion" in The Unicode Standard.

                  Unicode 標準顯示了一個示例:

                  The Unicode Standard shows an example:

                  before: <61    F1 80 80  E1 80  C2    62    80    63    80    BF    64  >
                  after:  <0061  FFFD      FFFD   FFFD  0062  FFFD  0063  FFFD  FFFD  0064>
                  

                  這里是 preg_replace_callback() 根據上述規則的實現.

                  Here is the implementation by preg_replace_callback() according to the above rule.

                  function replace_invalid_byte_sequence5($str)
                  {
                      // REPLACEMENT CHARACTER (U+FFFD)
                      $substitute = "xEFxBFxBD";
                      $regex = '/
                        ([x00-x7F]                       #   U+0000 -   U+007F
                        |[xC2-xDF][x80-xBF]            #   U+0080 -   U+07FF
                        | xE0[xA0-xBF][x80-xBF]       #   U+0800 -   U+0FFF
                        |[xE1-xECxEExEF][x80-xBF]{2} #   U+1000 -   U+CFFF
                        | xED[x80-x9F][x80-xBF]       #   U+D000 -   U+D7FF
                        | xF0[x90-xBF][x80-xBF]{2}    #  U+10000 -  U+3FFFF
                        |[xF1-xF3][x80-xBF]{3}         #  U+40000 -  U+FFFFF
                        | xF4[x80-x8F][x80-xBF]{2})   # U+100000 - U+10FFFF
                        |(xE0[xA0-xBF]                  #   U+0800 -   U+0FFF (invalid)
                        |[xE1-xECxEExEF][x80-xBF]    #   U+1000 -   U+CFFF (invalid)
                        | xED[x80-x9F]                  #   U+D000 -   U+D7FF (invalid)
                        | xF0[x90-xBF][x80-xBF]?      #  U+10000 -  U+3FFFF (invalid)
                        |[xF1-xF3][x80-xBF]{1,2}       #  U+40000 -  U+FFFFF (invalid)
                        | xF4[x80-x8F][x80-xBF]?)     # U+100000 - U+10FFFF (invalid)
                        |(.)                               # invalid 1-byte
                      /xs';
                  
                      // $matches[1]: valid character
                      // $matches[2]: invalid 3-byte or 4-byte character
                      // $matches[3]: invalid 1-byte
                  
                      $ret = preg_replace_callback($regex, function($matches) use($substitute) {
                  
                          if (isset($matches[2]) || isset($matches[3])) {
                  
                              return $substitute;
                  
                          }
                      
                          return $matches[1];
                  
                      }, $str);
                  
                      return $ret;
                  }
                  

                  通過這種方式可以直接比較字節,避免preg_match對字節大小的限制.

                  You can compare byte directly and avoid preg_match's restriction about byte size by this way.

                  function replace_invalid_byte_sequence6($str) {
                  
                      $size = strlen($str);
                      $substitute = "xEFxBFxBD";
                      $ret = '';
                  
                      $pos = 0;
                      $char;
                      $char_size;
                      $valid;
                  
                      while (utf8_get_next_char($str, $size, $pos, $char, $char_size, $valid)) {
                          $ret .= $valid ? $char : $substitute;
                      }
                  
                      return $ret;
                  }
                  
                  function utf8_get_next_char($str, $str_size, &$pos, &$char, &$char_size, &$valid)
                  {
                      $valid = false;
                  
                      if ($str_size <= $pos) {
                          return false;
                      }
                  
                      if ($str[$pos] < "x80") {
                  
                          $valid = true;
                          $char_size =  1;
                  
                      } else if ($str[$pos] < "xC2") {
                  
                          $char_size = 1;
                  
                      } else if ($str[$pos] < "xE0")  {
                  
                          if (!isset($str[$pos+1]) || $str[$pos+1] < "x80" || "xBF" < $str[$pos+1]) {
                  
                              $char_size = 1;
                  
                          } else {
                  
                              $valid = true;
                              $char_size = 2;
                  
                          }
                  
                      } else if ($str[$pos] < "xF0") {
                  
                          $left = "xE0" === $str[$pos] ? "xA0" : "x80";
                          $right = "xED" === $str[$pos] ? "x9F" : "xBF";
                  
                          if (!isset($str[$pos+1]) || $str[$pos+1] < $left || $right < $str[$pos+1]) {
                  
                              $char_size = 1;
                  
                          } else if (!isset($str[$pos+2]) || $str[$pos+2] < "x80" || "xBF" < $str[$pos+2]) {
                  
                              $char_size = 2;
                  
                          } else {
                  
                              $valid = true;
                              $char_size = 3;
                  
                         }
                  
                      } else if ($str[$pos] < "xF5") {
                  
                          $left = "xF0" === $str[$pos] ? "x90" : "x80";
                          $right = "xF4" === $str[$pos] ? "x8F" : "xBF";
                  
                          if (!isset($str[$pos+1]) || $str[$pos+1] < $left || $right < $str[$pos+1]) {
                  
                              $char_size = 1;
                  
                          } else if (!isset($str[$pos+2]) || $str[$pos+2] < "x80" || "xBF" < $str[$pos+2]) {
                  
                              $char_size = 2;
                  
                          } else if (!isset($str[$pos+3]) || $str[$pos+3] < "x80" || "xBF" < $str[$pos+3]) {
                  
                              $char_size = 3;
                  
                          } else {
                  
                              $valid = true;
                              $char_size = 4;
                  
                          }
                  
                      } else {
                  
                          $char_size = 1;
                  
                      }
                  
                      $char = substr($str, $pos, $char_size);
                      $pos += $char_size;
                  
                      return true;
                  }
                  

                  測試用例在這里.

                  function run(array $callables, array $arguments)
                  {
                      return array_map(function($callable) use($arguments) {
                           return array_map($callable, $arguments);
                      }, $callables);
                  }
                      
                  $data = [
                      // Table 3-8. Use of U+FFFD in UTF-8 Conversion
                      // http://www.unicode.org/versions/Unicode6.1.0/ch03.pdf)
                      "x61"."xF1x80x80"."xE1x80"."xC2"."x62"."x80"."x63"
                      ."x80"."xBF"."x64",
                  
                      // 'FULL MOON SYMBOL' (U+1F315) and invalid byte sequence
                      "xF0x9Fx8Cx95"."xF0x9Fx8C"."xF0x9Fx8C"
                  ];
                  
                  var_dump(run([
                      'replace_invalid_byte_sequence', 
                      'replace_invalid_byte_sequence2',
                      'replace_invalid_byte_sequence3',
                      'replace_invalid_byte_sequence4',
                      'replace_invalid_byte_sequence5',
                      'replace_invalid_byte_sequence6'
                  ], $data));
                  

                  請注意,mb_convert_encoding 有一個錯誤,它會在無效字節序列之后立即中斷有效字符,或者在不添加 U+FFFD 的情況下刪除有效字符之后的無效字節序列.

                  As a note, mb_convert_encoding has a bug that breaks s valid character just after invalid byte sequence or remove invalid byte sequence after valid characters without adding U+FFFD.

                  $data = [
                      // U+20AC
                      "xE2x82xAC"."xE2x82xAC"."xE2x82xAC",
                      "xE2x82"    ."xE2x82xAC"."xE2x82xAC",
                  
                      // U+24B62
                      "xF0xA4xADxA2"."xF0xA4xADxA2"."xF0xA4xADxA2",
                      "xF0xA4xAD"    ."xF0xA4xADxA2"."xF0xA4xADxA2",
                      "xA4xADxA2"."xF0xA4xADxA2"."xF0xA4xADxA2",
                  
                      // 'FULL MOON SYMBOL' (U+1F315)
                      "xF0x9Fx8Cx95" . "xF0x9Fx8C",
                      "xF0x9Fx8Cx95" . "xF0x9Fx8C" . "xF0x9Fx8C"
                  ];
                  

                  盡管 preg_match() 可以代替 preg_replace_callback 使用,但此函數對字節大小有限制.有關詳細信息,請參閱錯誤報告 #36463.可以通過下面的測試用例來確認.

                  Although preg_match() can be used intead of preg_replace_callback, this function has a limition on bytesize. See bug report #36463 for details. You can confirm it by the following test case.

                  str_repeat('a', 10000)
                  

                  最后,我的基準測試結果如下.

                  Finally, the result of my benchmark is following.

                  mb_convert_encoding()
                  0.19628190994263
                  htmlspecialchars()
                  0.082863092422485
                  UConverter::transcode()
                  0.15999984741211
                  UConverter::convert()
                  0.29843020439148
                  preg_replace_callback()
                  0.63967490196228
                  direct comparision
                  0.71933102607727
                  

                  基準代碼在這里.

                  function timer(array $callables, array $arguments, $repeat = 10000)
                  {
                  
                      $ret = [];
                      $save = $repeat;
                  
                      foreach ($callables as $key => $callable) {
                  
                          $start = microtime(true);
                  
                          do {
                      
                              array_map($callable, $arguments);
                  
                          } while($repeat -= 1);
                  
                          $stop = microtime(true);
                          $ret[$key] = $stop - $start;
                          $repeat = $save;
                  
                      }
                  
                      return $ret;
                  }
                  
                  $functions = [
                      'mb_convert_encoding()' => 'replace_invalid_byte_sequence',
                      'htmlspecialchars()' => 'replace_invalid_byte_sequence2',
                      'UConverter::transcode()' => 'replace_invalid_byte_sequence3',
                      'UConverter::convert()' => 'replace_invalid_byte_sequence4',
                      'preg_replace_callback()' => 'replace_invalid_byte_sequence5',
                      'direct comparision' => 'replace_invalid_byte_sequence6'
                  ];
                  
                  foreach (timer($functions, $data) as $description => $time) {
                  
                      echo $description, PHP_EOL,
                           $time, PHP_EOL;
                  
                  }
                  

                  這篇關于用問號替換無效的 UTF-8 字符,mbstring.substitute_character 似乎被忽略了的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網!

                  【網站聲明】本站部分內容來源于互聯網,旨在幫助大家更快的解決問題,如果有圖片或者內容侵犯了您的權益,請聯系我們刪除處理,感謝您的支持!

                  相關文檔推薦

                  Deadlock exception code for PHP, MySQL PDOException?(PHP、MySQL PDOException 的死鎖異常代碼?)
                  PHP PDO MySQL scrollable cursor doesn#39;t work(PHP PDO MySQL 可滾動游標不起作用)
                  PHP PDO ODBC connection(PHP PDO ODBC 連接)
                  Using PDO::FETCH_CLASS with Magic Methods(使用 PDO::FETCH_CLASS 和魔術方法)
                  php pdo get only one value from mysql; value that equals to variable(php pdo 只從 mysql 獲取一個值;等于變量的值)
                  MSSQL PDO could not find driver(MSSQL PDO 找不到驅動程序)

                    <tbody id='lxeTM'></tbody>
                  • <legend id='lxeTM'><style id='lxeTM'><dir id='lxeTM'><q id='lxeTM'></q></dir></style></legend>
                  • <i id='lxeTM'><tr id='lxeTM'><dt id='lxeTM'><q id='lxeTM'><span id='lxeTM'><b id='lxeTM'><form id='lxeTM'><ins id='lxeTM'></ins><ul id='lxeTM'></ul><sub id='lxeTM'></sub></form><legend id='lxeTM'></legend><bdo id='lxeTM'><pre id='lxeTM'><center id='lxeTM'></center></pre></bdo></b><th id='lxeTM'></th></span></q></dt></tr></i><div class="qwawimqqmiuu" id='lxeTM'><tfoot id='lxeTM'></tfoot><dl id='lxeTM'><fieldset id='lxeTM'></fieldset></dl></div>
                        <bdo id='lxeTM'></bdo><ul id='lxeTM'></ul>

                          <small id='lxeTM'></small><noframes id='lxeTM'>

                            <tfoot id='lxeTM'></tfoot>
                            主站蜘蛛池模板: 91麻豆精品国产91久久久更新资源速度超快 | 国产成人区 | 亚洲精品欧美 | 亚洲免费精品 | 国产成人福利 | 欧美精品一区二区三区在线播放 | 国产一区在线免费 | 久久成人精品 | 国产精品毛片无码 | 一本一道久久a久久精品综合 | 黑人精品欧美一区二区蜜桃 | 91精品国产一区二区在线观看 | 国产色 | 天天拍天天草 | 在线看片网站 | 日韩精品一区二区三区中文在线 | 国产精品99久久久久久动医院 | 免费黄色的视频 | 国产精品久久久久久久久久久久久 | 国产资源一区二区三区 | 91精品国产乱码久久久久久久久 | 成人av网站在线观看 | 久久久久精 | 色姑娘综合网 | 国外成人在线视频 | 国产日韩精品一区 | 欧洲免费视频 | 精品一区二区三区视频在线观看 | 最新日韩在线视频 | 国产在线视频网 | 91精品国产日韩91久久久久久 | 国产精品久久久久久久久 | 天天躁日日躁狠狠躁白人 | 国产高清免费视频 | 亚洲在线 | 99精品视频一区二区三区 | 国产美女精品 | 在线a视频 | 99精品在线 | 亚洲在线 | 午夜视频免费在线观看 |