問題描述
我也有帶有變音符號的 UTF-8 文本,想檢查該文本的第一個字母是大寫還是小寫.如何做到這一點?
I have texts in UTF-8 with diacritic characters also, and would like to check if first letter of this text is upper case or lower case. How to do this?
推薦答案
我認為,與此處發布的其他解決方案相比,進行 preg_
調用是最直接、簡潔和可靠的調用.
It is my opinion that making a preg_
call is the most direct, concise, and reliable call versus the other posted solutions here.
echo preg_match('~^p{Lu}~u', $string) ? 'upper' : 'lower';
我的模式分解:
~ # starting pattern delimiter
^ #match from the start of the input string
p{Lu} #match exactly one uppercase letter (unicode safe)
~ #ending pattern delimiter
u #enable unicode matching
ctype_
和 <時請注意'a'
在這一系列測試中失敗了.
Please take notice when ctype_
and < 'a'
fail with this battery of tests.
代碼:(演示)
$tests = ['aa', 'Bbbbb', 'éé', 'iou', 'Δδ'];
foreach ($tests as $test) {
echo "
{$test}:";
echo "
PREG: " , preg_match('~^p{Lu}~u', $test) ? 'upper' : 'lower';
echo "
CTYPE: " , ctype_upper(mb_substr($test, 0, 1)) ? 'upper' : 'lower';
echo "
< a: " , mb_substr($test, 0, 1) < 'a' ? 'upper' : 'lower';
$chr = mb_substr ($test, 0, 1, "UTF-8");
echo "
MB: " , mb_strtoupper($chr, "UTF-8") == $chr ? 'upper' : 'lower';
}
輸出:
aa:
PREG: lower
CTYPE: lower
< a: lower
MB: lower
Bbbbb:
PREG: upper
CTYPE: upper
< a: upper
MB: upper
éé: <-- trouble
PREG: upper
CTYPE: lower <-- uh oh
< a: lower <-- uh oh
MB: upper
iou:
PREG: lower
CTYPE: lower
< a: lower
MB: lower
Δδ: <-- extended beyond question scope
PREG: upper <-- still holding up
CTYPE: lower
< a: lower
MB: upper <-- still holding up
如果有人需要區分大寫字母、小寫字母和非字母,請參閱這篇文章.
If anyone needs to differentiate between uppercase letters, lowercase letters, and non-letters see this post.
這可能把這個問題的范圍擴展得太遠了,但如果你輸入的字符特別松散(它們可能不存在于Lu
可以處理的類別中),你可能需要檢查一下第一個字符有大小寫變體:
It may be extending the scope of this question too far, but if your input characters are especially squirrelly (they might not exist in a category that Lu
can handle), you may want to check if the first character has case variants:
p{L&} 或 p{Cased_Letter}:存在大小寫變體的字母(Ll、Lu 和 Lt 的組合).
p{L&} or p{Cased_Letter}: a letter that exists in lowercase and uppercase variants (combination of Ll, Lu and Lt).
- 來源:https://www.regular-expressions.info/unicode.html
要包含帶有 SMALL
變體的羅馬數字(數字字母"),如有必要,您可以將該額外范圍添加到模式中.
To include Roman Numerals ("Number Letters") with SMALL
variants, you can add that extra range to the pattern if necessary.
https://www.fileformat.info/info/unicode/category/Nl/list.htm
代碼:(演示)
echo preg_match('~^[p{Lu}x{2160}-x{216F}]~u', $test) ? 'upper' : 'not upper';
這篇關于PHP中如何判斷字母是大寫還是小寫?的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網!