問(wèn)題描述
夏威夷語(yǔ)引用在使用 T-SQL 時(shí)有一些奇怪的行為結(jié)合字符串函數(shù).這里發(fā)生了什么?我錯(cuò)過(guò)了什么嗎?其他角色是否也有同樣的問(wèn)題?
The Hawaiian quote has some weird behavior in T-SQL when using it in conjunction with string functions. What's going on here? Am I missing something? Do other characters suffer from this same problem?
SELECT UNICODE(N'?') -- Returns 699 as expected.
SELECT REPLACE(N'"?', '"', '_') -- Returns "?, I expected _?
SELECT REPLACE(N'a?', 'a', '_') -- Returns a?, I expected _?
SELECT REPLACE(N'"?', N'?', '_') -- Returns __, I expected "_
SELECT REPLACE(N'-', N'?', '_') -- Returns -, I expected -
另外,在 LIKE
中使用時(shí)很奇怪,例如:
Also, strange when used in a LIKE
for example:
DECLARE @table TABLE ([Name] NVARCHAR(MAX))
INSERT INTO
@table
VALUES
('John'),
('Jane')
SELECT
*
FROM
@table
WHERE
[Name] LIKE N'%?%' -- This returns both records. I expected none.
推薦答案
夏威夷語(yǔ)引號(hào)在與字符串函數(shù)結(jié)合使用時(shí)在 T-SQL 中有一些奇怪的行為....其他角色是否也有同樣的問(wèn)題?
The Hawaiian quote has some weird behavior in T-SQL when using it in conjunction with string functions. ... Do other characters suffer from this same problem?
一些事情:
- 這不是夏威夷語(yǔ)的引用":它是聲門(mén)停頓"影響發(fā)音.
- 這不是奇怪"的行為:這不是您所期望的.
這種行為并不是一個(gè)問(wèn)題",盡管是的,還有其他角色表現(xiàn)出類似的行為.例如,以下字符(U+02DA 環(huán)上方)的行為略有不同,具體取決于它位于字符的哪一側(cè):
- This is not a Hawaiian "quote": it's a "glottal stop" which affects pronunciation.
- It is not "weird" behavior: it's just not what you were expecting.
This behavior is not specifically a "problem", though yes, there are other characters that exhibit similar behavior. For example, the following character (U+02DA Ring Above) behaves slightly differently depending on which side of a character it is on:
SELECT REPLACE(N'a?aa' COLLATE Latin1_General_100_CI_AS, N'?a', N'_'); -- Returns a_a
SELECT REPLACE(N'a?aa' COLLATE Latin1_General_100_CI_AS, N'a?', N'_'); -- Returns _aa
現(xiàn)在,任何使用 SQL Server 2008 或更新版本的人都應(yīng)該使用 100(或更新)級(jí)別的排序規(guī)則.他們?cè)?100 系列中添加了很多排序權(quán)重和大寫(xiě)/小寫(xiě)映射,這些映射不在 90 系列、非編號(hào)系列或大部分過(guò)時(shí)的 SQL Server 排序規(guī)則(名稱以 SQL_
).
Now, anyone using SQL Server 2008 or newer should be using a 100 (or newer) level collation. They added a lot of sort weights and uppercase/lowercase mappings in the 100 series that aren't in the 90 series, or the non-numbered series, or the mostly obsolete SQL Server collations (those with names starting with SQL_
).
這里的問(wèn)題不是它不等同于任何其他字符(在二進(jìn)制排序規(guī)則之外),實(shí)際上它確實(shí)等同于另一個(gè)字符(U+0312 組合上方的轉(zhuǎn)逗號(hào)):
The issue here is not that it doesn't equate to any other character (outside of a binary collation), and in fact it actually does equate to one other character (U+0312 Combining Turned Comma Above):
;WITH nums AS
(
SELECT TOP (65536) (ROW_NUMBER() OVER (ORDER BY @@MICROSOFTVERSION) - 1) AS [num]
FROM [master].sys.all_columns ac1
CROSS JOIN [master].sys.all_columns ac2
)
SELECT nums.[num] AS [INTvalue],
CONVERT(BINARY(2), nums.[num]) AS [BINvalue],
NCHAR(nums.[num]) AS [Character]
FROM nums
WHERE NCHAR(nums.[num]) = NCHAR(0x02BB) COLLATE Latin1_General_100_CI_AS;
/*
INTvalue BINvalue Character
699 0x02BB ?
786 0x0312 ?
*/
問(wèn)題在于這是一個(gè)間距修飾符"字符,因此它會(huì)附加到它之前或之后的字符并修改其含義/發(fā)音,具體取決于您正在處理的修飾符字符.
The issue is that this is a "spacing modifier" character, and so it attaches to, and modifies the meaning / pronunciation of, the character before or after it, depending on which modifier character you are dealing with.
根據(jù) Unicode 標(biāo)準(zhǔn)第 7 章(歐洲-I),第 7.8 節(jié)(修飾符),第 323 頁(yè)(文檔的,不是 PDF 的):
According to the Unicode Standard, Chapter 7 (Europe-I), Section 7.8 (Modifier Letters), Page 323 (of the document, not of the PDF):
修飾字母,在 Unicode 標(biāo)準(zhǔn)中使用的意義上,是通常與其他字母相鄰書(shū)寫(xiě)的字母或符號(hào),并以某種方式修改它們的用法.它們不是正式的組合標(biāo)記(gc = Mn 或 gc = Mc),也沒(méi)有與它們修改的基本字母以圖形方式組合.他們本身就是基本角色.他們修改其他字母的意義更多的是他們?cè)谑褂弥械恼Z(yǔ)義問(wèn)題;它們的功能往往就像變音符號(hào)一樣,表示字母發(fā)音的變化,或以其他方式區(qū)分字母的用法.通常,這種變音符號(hào)修飾適用于修飾符字母之前的字符,但修飾符字母有時(shí)可能會(huì)修飾后面的字符.有時(shí),修飾字母可能只是單獨(dú)代表它自己的聲音.
...
7.8 Modifier Letters
Modifier letters, in the sense used in the Unicode Standard, are letters or symbols that are typically written adjacent to other letters and which modify their usage in some way. They are not formally combining marks (gc = Mn or gc = Mc) and do not graphically combine with the base letter that they modify. They are base characters in their own right. The sense in which they modify other letters is more a matter of their semantics in usage; they often tend to function as if they were diacritics, indicating a change in pronunciation of a letter, or otherwise distinguishing a letter’s use. Typically this diacritic modification applies to the character preceding the modifier letter, but modifier letters may sometimes modify a following character. Occasionally a modifier letter may simply stand alone representing its own sound.
...
拼音用法.此塊中的大多數(shù)修飾字母都是拼音修飾符,包括覆蓋國(guó)際音標(biāo)所需的字符.在許多情況下,修飾字母用于表示相鄰字母的發(fā)音在某些方面有所不同——因此得名修飾符".它們也用于標(biāo)記重音或音調(diào),或者可能只是代表他們自己的聲音.
Phonetic Usage. The majority of the modifier letters in this block are phonetic modifiers, including the characters required for coverage of the International Phonetic Alphabet. In many cases, modifier letters are used to indicate that the pronunciation of an adjacent letter is different in some way—hence the name "modifier." They are also used to mark stress or tone, or may simply represent their own sound.
下面的例子應(yīng)該有助于說(shuō)明.我使用的是 100 級(jí)排序規(guī)則,它需要區(qū)分重音(即名稱包含 _AS
):
SELECT REPLACE(N'?' COLLATE Latin1_General_100_CI_AS, N'?', N'_'); -- Returns _
SELECT REPLACE(N'?a' COLLATE Latin1_General_100_CI_AS, N'?', N'_'); -- Returns _a
SELECT REPLACE(N'?aa' COLLATE Latin1_General_100_CI_AS, N'?', N'_'); -- Returns _aa
SELECT REPLACE(N'a?aa' COLLATE Latin1_General_100_CI_AS, N'?', N'_'); -- Returns __aa
SELECT REPLACE(N'?aa' COLLATE Latin1_General_100_CI_AS, N'?a', N'_'); -- Returns ?__
SELECT REPLACE(N'a?aa' COLLATE Latin1_General_100_CI_AS, N'?a', N'_'); -- Returns a?__
SELECT REPLACE(N'a?aa' COLLATE Latin1_General_100_CI_AS, N'a?', N'_'); -- Returns _aa
SELECT REPLACE(N'a?aa' COLLATE Latin1_General_100_CI_AS, N'a?a', N'_'); -- Returns _a
SELECT REPLACE(N'a?aa' COLLATE Latin1_General_100_CI_AS, N'a', N'_'); -- Returns a?__
SELECT REPLACE(N'??aa' COLLATE Latin1_General_100_CI_AS, N'?', N'_'); -- Returns ??aa
SELECT REPLACE(N'??aa' COLLATE Latin1_General_100_CI_AS, N'?', N'_'); -- Returns ??aa
SELECT REPLACE(N'?aa' COLLATE Latin1_General_100_CI_AS, N'?', N'_'); -- Returns _aa
SELECT CHARINDEX(N'a', N'a?a' COLLATE Latin1_General_100_CI_AS); -- 3
SELECT CHARINDEX(N'a', N'a?a' COLLATE Latin1_General_100_CI_AI); -- 1
SELECT 1 WHERE N'a' = N'a?' COLLATE Latin1_General_100_CI_AS; -- (0 rows returned)
SELECT 2 WHERE N'a' = N'a?' COLLATE Latin1_General_100_CI_AI; -- 2
如果您需要以忽略其預(yù)期語(yǔ)言行為的方式處理此類字符,那么是的,您必須使用二進(jìn)制排序規(guī)則.在這種情況下,請(qǐng)使用最新的排序規(guī)則和 BIN2
而不是 BIN
(假設(shè)您使用的是 SQL Server 2005 或更新版本).含義:
If you need to deal with such characters in a way that ignores their intended linguistic behavior, then yes, you must use a binary collation. In such cases, please use the most recent level of collation, and BIN2
instead of BIN
(assuming you are using SQL Server 2005 or newer). Meaning:
- SQL Server 2000:
Latin1_General_BIN
- SQL Server 2005:
Latin1_General_BIN2
- SQL Server 2008、2008 R2、2012、2014 和 2016:
Latin1_General_100_BIN2
- SQL Server 2017 及更新版本:
Japanese_XJIS_140_BIN2
如果您想知道我為什么提出這個(gè)建議,請(qǐng)參閱:
If you are curious why I make that recommendation, please see:
各種二進(jìn)制排序規(guī)則之間的差異(文化、版本和 BIN 與 BIN2)
此外,有關(guān)排序規(guī)則/Unicode/編碼等的更多信息,請(qǐng)?jiān)L問(wèn):排序規(guī)則信息
And, for more information on collations / Unicode / encodings / etc, please visit: Collations Info
這篇關(guān)于特殊字符(Hawaiian 'Okina)導(dǎo)致奇怪的字符串行為的文章就介紹到這了,希望我們推薦的答案對(duì)大家有所幫助,也希望大家多多支持html5模板網(wǎng)!