問題描述
我無法使用阿拉伯文本進行變音符號不敏感搜索.
I have trouble making a diacritic insensitive search with arabic text.
我已經為相關表測試了多種設置:utf8 和 utf16 中的編碼以及 utf8_general_ci、utf16_general_ci 和 utf16_unicode_ci 中的排序規則.
I have tested multiple setups for the table in question: encodings in utf8 and utf16 as well as collations in utf8_general_ci, utf16_general_ci and utf16_unicode_ci.
該搜索適用于 ?? 特殊字符.即:
The search works for ?? special characters. I.e:
select * from test where text like '%a%'
將返回文本為 a、? 或 ? 的列.但它不適用于阿拉伯語變音符號.即,如果文本是 ?????? 并且我搜索 ???,我沒有得到任何點擊.
Would return columns where text is a, ? or ?. But it won't work with the Arabic diacritics. I.e if the text is ?????? and I search for ???, I don't get any hits.
任何想法如何通過這個?
Any ideas how to get pass this?
真正的用途稍后將是 PHP(一個搜索功能),但我直接在 MySQL 數據庫中工作,只是為了在將其移植到 PHP 之前進行測試.
The real usage will later be PHP (a search function), but I'm working directly in the MySQL db just for testing before I port it over to PHP.
(來自評論)
CREATE TABLE test (
? id int(11) unsigned NOT NULL AUTO_INCREMENT,
? text text COLLATE utf8_unicode_ci,
? PRIMARY KEY (id)?
) ENGINE=InnoDB AUTO_INCREMENT=7 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
推薦答案
(這不是答案",而是解決方案".)
(This is not an "answer", but a "resolution".)
LIKE
似乎不適用于您的阿拉伯語字符串.我不知道它失敗了多少.我建議您在 http://bugs.mysql.com 上編寫錯誤報告.這是一個測試用例,表明 LIKE '...'
和 LIKE '%...%'
都找不到兩個字符串,而 '=' 有效:>
It seems that LIKE
does not work with your Arabic string. I don't know how much more it fails on. I recommend you write a bug report at http://bugs.mysql.com . Here is a test case that shows that neither LIKE '...'
nor LIKE '%...%'
finds both strings, whereas '=' works:
CREATE TABLE so28863402 (
id int(11) unsigned NOT NULL AUTO_INCREMENT,
txt text COLLATE utf8_unicode_ci, -- deliberate choice of COLLATION
PRIMARY KEY (id)
) ENGINE=InnoDB
DEFAULT CHARSET=utf8;
INSERT INTO so28863402 (txt) VALUES
(UNHEX('D8A8D990D8B3D992D985D990')), -- Using hex to avoid any copy/paste issues
(UNHEX('D8A8D8B3D985')); -- The values should compare equal
SELECT id, txt, HEX(txt) FROM so28863402;
SELECT txt, COUNT(*) FROM so28863402 GROUP BY txt; -- GROUP BY finds them equal.
SELECT * from so28863402
WHERE txt = '???'; -- Finds both rows (correct)
SELECT * from so28863402
WHERE txt LIKE '%???%'; -- Finds one row (incorrect)
-- Further checks:
SELECT * FROM so28863402 WHERE txt = UNHEX( 'D8A8D8B3D985' );
SELECT * FROM so28863402 WHERE txt LIKE UNHEX( 'D8A8D8B3D985' );
SELECT * FROM so28863402 WHERE txt LIKE UNHEX('25D8A8D8B3D98525'); -- x25 is '%'
這篇關于MySQL變音符號不敏感搜索(阿拉伯語)的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網!