問題描述
我已經(jīng)研究了一段時(shí)間,但找不到從 SQL Server 中的逗號(hào)分隔字符串和管道分隔字符串中刪除重復(fù)字符串的方法.
I have been looking into this for a while now and I cannot find a way to remove duplicate strings from a comma-separated as well as pipeline seperated string in SQL Server.
給定字符串
test1,test2,test1|test2,test3|test4,test4|test4
有誰知道你會(huì)如何返回 test1,test2,test3,test4?
does anyone know how would you return test1,test2,test3,test4?
推薦答案
方法
以下方法可用于對(duì)分隔的值列表進(jìn)行重復(fù)數(shù)據(jù)刪除.
The following approach can be used to de-duplicate a delimited list of values.
- 使用
REPLACE()
函數(shù)將不同的分隔符轉(zhuǎn)換為相同的分隔符. - 使用
REPLACE()
函數(shù)注入 XML 結(jié)束和開始標(biāo)記以創(chuàng)建 XML 片段 - 使用
CAST(expr AS XML)
函數(shù)將上述片段轉(zhuǎn)換為XML數(shù)據(jù)類型 - 使用
OUTER APPLY
應(yīng)用表值函數(shù)nodes()
將 XML 片段拆分為其組成的 XML 標(biāo)記.這將在單獨(dú)的行中返回每個(gè) XML 標(biāo)記. - 使用
value()
函數(shù)僅從 XML 標(biāo)記中提取值,并使用指定的數(shù)據(jù)類型返回值. - 在上述值后附加一個(gè)逗號(hào).
- 請(qǐng)注意,這些值在不同的行中返回.
DISTINCT
關(guān)鍵字的使用現(xiàn)在可以刪除重復(fù)的行(即值). - 使用
FOR XML PATH('')
子句將多行中的值連接成一行.
- Use the
REPLACE()
function to convert different delimiters into the same delimiter. - Use the
REPLACE()
function to inject XML closing and opening tags to create an XML fragment - Use the
CAST(expr AS XML)
function to convert the above fragment into the XML data type - Use
OUTER APPLY
to apply the table-valued functionnodes()
to split the XML fragment into its constituent XML tags. This returns each XML tag on a separate row. - Extract just the value from the XML tag using the
value()
function, and returns the value using the specified data type. - Append a comma after the above-mentioned value.
- Note that these values are returned on separate rows. The usage of the
DISTINCT
keyword now removes duplicate rows (i.e. values). - Use the
FOR XML PATH('')
clause to concatenate the values across multiple rows into a single row.
查詢
將上述方法放在查詢表單中:
Putting the above approach in query form:
SELECT DISTINCT PivotedTable.PivotedColumn.value('.','nvarchar(max)') + ','
FROM (
-- This query returns the following in theDataXml column:
-- <tag>test1</tag><tag>test2</tag><tag>test1</tag><tag>test2</tag><tag>test3</tag><tag>test4</tag><tag>test4</tag><tag>test4</tag>
-- i.e. it has turned the original delimited data into an XML fragment
SELECT
DataTable.DataColumn AS DataRaw
, CAST(
'<tag>'
-- First replace commas with pipes to have only a single delimiter
-- Then replace the pipe delimiters with a closing and opening tag
+ replace(replace(DataTable.DataColumn, ',','|'), '|','</tag><tag>')
-- Add a final set of closing tags
+ '</tag>'
AS XML) AS DataXml
FROM ( SELECT 'test1,test2,test1|test2,test3|test4,test4|test4' AS DataColumn) AS DataTable
) AS x
OUTER APPLY DataXml.nodes('tag') AS PivotedTable(PivotedColumn)
-- Running the query without the following line will return the data in separate rows
-- Running the query with the following line returns the rows concatenated, i.e. it returns:
-- test1,test2,test3,test4,
FOR XML PATH('')
輸入&結(jié)果
給定輸入:
test1,test2,test1|test2,test3|test4,test4|test4
test1,test2,test1|test2,test3|test4,test4|test4
上面的查詢會(huì)返回結(jié)果:
The above query will return the result:
測(cè)試1,測(cè)試2,測(cè)試3,測(cè)試4,
test1,test2,test3,test4,
注意末尾的尾隨逗號(hào).我將把它作為練習(xí)留給你來刪除它.
Notice the trailing comma at the end. I'll leave it as an exercise to you to remove that.
重復(fù)次數(shù)
OP 在評(píng)論中請(qǐng)求我如何獲得重復(fù)的計(jì)數(shù)?在單獨(dú)的列中".
OP requested in a comment "how do i get t5he count of duplicates as well? in a seperate column".
最簡(jiǎn)單的方法是使用上述查詢,但刪除最后一行 FOR XML PATH('')
.然后,計(jì)算上述查詢中 SELECT
表達(dá)式返回的所有值和不同值(即 PivotedTable.PivotedColumn.value('.','nvarchar(max)')
).所有值的計(jì)數(shù)與不同值的計(jì)數(shù)之間的差值是重復(fù)值的計(jì)數(shù).
The simplest way would be to use the above query but remove the last line FOR XML PATH('')
. Then, counting all values and distinct values returned by the SELECT
expression in the above query (i.e. PivotedTable.PivotedColumn.value('.','nvarchar(max)')
). The difference between the count of all values and the count of distinct values is the count of duplicate values.
SELECT
COUNT(PivotedTable.PivotedColumn.value('.','nvarchar(max)')) AS CountOfAllValues
, COUNT(DISTINCT PivotedTable.PivotedColumn.value('.','nvarchar(max)')) AS CountOfUniqueValues
-- The difference of the previous two counts is the number of duplicate values
, COUNT(PivotedTable.PivotedColumn.value('.','nvarchar(max)'))
- COUNT(DISTINCT PivotedTable.PivotedColumn.value('.','nvarchar(max)')) AS CountOfDuplicateValues
FROM (
-- This query returns the following in theDataXml column:
-- <tag>test1</tag><tag>test2</tag><tag>test1</tag><tag>test2</tag><tag>test3</tag><tag>test4</tag><tag>test4</tag><tag>test4</tag>
-- i.e. it has turned the original delimited data into an XML fragment
SELECT
DataTable.DataColumn AS DataRaw
, CAST(
'<tag>'
-- First replace commas with pipes to have only a single delimiter
-- Then replace the pipe delimiters with a closing and opening tag
+ replace(replace(DataTable.DataColumn, ',','|'), '|','</tag><tag>')
-- Add a final set of closing tags
+ '</tag>'
AS XML) AS DataXml
FROM ( SELECT 'test1,test2,test1|test2,test3|test4,test4|test4' AS DataColumn) AS DataTable
) AS x
OUTER APPLY DataXml.nodes('tag') AS PivotedTable(PivotedColumn)
對(duì)于上面顯示的相同輸入,此查詢的輸出是:
For the same input shown above, the output of this query is:
CountOfAllValues CountOfUniqueValues CountOfDuplicateValues
---------------- ------------------- ----------------------
8 4 4
這篇關(guān)于從逗號(hào)或管道運(yùn)算符字符串中刪除重復(fù)項(xiàng)的文章就介紹到這了,希望我們推薦的答案對(duì)大家有所幫助,也希望大家多多支持html5模板網(wǎng)!