久久久久久久av_日韩在线中文_看一级毛片视频_日本精品二区_成人深夜福利视频_武道仙尊动漫在线观看

SQL中兩個字符串的公共子串

Common substring of two string in SQL(SQL中兩個字符串的公共子串)
本文介紹了SQL中兩個字符串的公共子串的處理方法,對大家解決問題具有一定的參考價值,需要的朋友們下面隨著小編來一起學習吧!

問題描述

我需要在 SQL 中找到兩個字符串的公共子字符串(沒有空格).

查詢:

選擇 *從 tbl 為 a,tbl 為 b其中 a.str <>b.str

示例數據:

str1 |str2 |不帶空格的最大子串----------+-----------+---------------------------——aabcdfbas |里克迪瓦 |發展基金aaab akuc |aaabir |aaabab akuc |ab atr |AB

解決方案

我不同意那些說 SQL 不是這項工作的工具的人.在有人向我展示比我的任何編程語言的解決方案更快的方法之前,我會斷言 SQL(以基于集合的方式編寫,無副作用,僅使用不可變變量)是唯一em> 用于此工作的工具(處理 varchar(8000)- 或 nvarchar(4000) 時).下面的解決方案是針對 varchar(8000).

1.正確索引的計數(數字)表.

-- (1) 構建并填充一個持久化的(數字)計數如果 OBJECT_ID('dbo.tally') 不是 NULL DROP TABLE dbo.tally;CREATE TABLE dbo.tally (n int not null);WITH DummyRows(V) AS(SELECT 1 FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1))) t(N))插入數據庫SELECT TOP (8000) ROW_NUMBER() OVER (ORDER BY (SELECT 1))FROM DummyRows a CROSS JOIN DummyRows b CROSS JOIN DummyRows c CROSS JOIN DummyRows d;-- (2) 為性能添加必需的約束(和索引)ALTER TABLE dbo.tally添加約束 pk_tally PRIMARY KEY CLUSTERED(N) WITH FILLFACTOR = 100;ALTER TABLE dbo.tally添加約束 uq_tally UNIQUE NONCLUSTERED(N);

請注意,計數表功能的性能將不佳.

2.使用我們的計數表將所有可能的子串返回一個字符串

以abcd"為例,讓我們獲取它的所有子字符串.注意我的評論.

聲明@s1 varchar(8000) = 'abcd';選擇位置 = t.N,tokenSize = x.N,string = substring(@s1, t.N, x.N)FROM dbo.tally t -- 令牌位置CROSS JOIN dbo.tally x -- 令牌長度WHERE t.N <= len(@s1) -- 所有位置AND x.N <= len(@s1) -- 所有長度AND len(@s1) - t.N - (x.N-1) >= 0 -- 過濾不必要的行 [e.g.substring('abcd',3,2)]

返回

position tokenSize 字符串----------- ----------- -------1 1 一個2 1 乙3 1 c4 1 天1 2 抗體2 2 公元前3 2 cd1 3 美國廣播公司2 3 bcd1 4 abcd

3.dbo.getshortstring8K

這個功能是什么?第一個重大優化.我們將把兩個字符串中較短的字符串分成每個可能的子字符串,然后查看它是否存在于較長的字符串中.如果您有兩個字符串(S1 和 S2)并且 S1 比 S2 長,我們知道 S1 的任何一個比 S2 長的子字符串都不是 S2 的子字符串.這就是 dbo.getshortstring 的目的:確保我們不執行任何不必要的子字符串比較.這會更有意義.

這非常重要,因為可以使用

...沒有排序或不必要的操作.速度而已.對于更長的字符串(例如 50 個字符以上),我有一個更快的技術,您可以閱讀關于 這里.

I need to find common substring (without space) of two strings in SQL.

Query:

select *
from tbl as a, tbl as b
where a.str <> b.str

Sample data:

str1      | str2      | max substring without spaces
----------+-----------+-----------------------------
aabcdfbas | rikcdfva  | cdf
aaab akuc | aaabir a  | aaab
ab akuc   | ab atr    | ab

解決方案

I disagree with those who say that SQL is not the tool for this job. Until someone can show me a faster way than my solution in ANY programming language, I will aver that SQL (written in a set-based, side-effect free, using only immutable variables) is the ONLY tool for this job (when dealing with varchar(8000)- or nvarchar(4000)). The solution below is for varchar(8000).

1. A correctly indexed tally (numbers) table.

-- (1) build and populate a persisted (numbers) tally 
IF OBJECT_ID('dbo.tally') IS NOT NULL DROP TABLE dbo.tally;
CREATE TABLE dbo.tally (n int not null);

WITH DummyRows(V) AS(SELECT 1 FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) t(N))
INSERT dbo.tally
SELECT TOP (8000) ROW_NUMBER() OVER (ORDER BY (SELECT 1))
FROM DummyRows a CROSS JOIN DummyRows b CROSS JOIN DummyRows c CROSS JOIN DummyRows d;

-- (2) Add Required constraints (and indexes) for performance
ALTER TABLE dbo.tally 
ADD CONSTRAINT pk_tally PRIMARY KEY CLUSTERED(N) WITH FILLFACTOR = 100;

ALTER TABLE dbo.tally 
ADD CONSTRAINT uq_tally UNIQUE NONCLUSTERED(N);

Note that a tally table function will not perform as well.

2. Using our tally table to return all possible substrings a string

Using "abcd" as an example, let's get all of it's substrings. Note my comments.

DECLARE @s1 varchar(8000) = 'abcd';

SELECT
  position  = t.N,
  tokenSize = x.N,
  string    = substring(@s1, t.N, x.N)  
FROM       dbo.tally t -- token position
CROSS JOIN dbo.tally x -- token length
WHERE t.N <=  len(@s1) -- all positions
AND   x.N <=  len(@s1) -- all lengths
AND   len(@s1) - t.N - (x.N-1) >= 0 -- filter unessesary rows [e.g.substring('abcd',3,2)]

This returns

position    tokenSize   string
----------- ----------- -------
1           1           a
2           1           b
3           1           c
4           1           d
1           2           ab
2           2           bc
3           2           cd
1           3           abc
2           3           bcd
1           4           abcd

3. dbo.getshortstring8K

What's this function about? The first major optimization. We're going to break the shorter of the two strings into every possible substring then see if it exists in the longer string. If you have two strings (S1 and S2) and S1 is longer than S2, we know that none of the substrings of S1, that are longer than S2, will be a substring of S2. That's the purpose of dbo.getshortstring: to ensure that we don't perform any unnecessary substring comparisons. This will make more sense in a moment.

This is hugely important because, the number of substrings in a string can be calculated using a Triangle Number Function. With N as the length (number of characters) in a string, the number of substrings can be calculated as N*(N+1)/2. E.g. "abc" has 6 substrings: 3*(3+1)/2 = 6; a,b,c,ab,bc,abc. If we're comparing "abc" to "abcdefgh" we don't need to check if "abcd" is a substring of "abc".

Breaking "abcdefgh" (length=8) into all possible substrings requires 8*(8+1)/2 = 36 operations (vs 6 for "abc").

IF OBJECT_ID('dbo.getshortstring8k') IS NOT NULL DROP FUNCTION dbo.getshortstring8k;
GO
CREATE FUNCTION dbo.getshortstring8k(@s1 varchar(8000), @s2 varchar(8000))
RETURNS TABLE WITH SCHEMABINDING AS RETURN 
SELECT s1 = CASE WHEN LEN(@s1) < LEN(@s2) THEN @s1 ELSE @s2 END,
       s2 = CASE WHEN LEN(@s1) < LEN(@s2) THEN @s2 ELSE @s1 END;

4. Finding all subsrings of the shorter string that exist in the longer string:

DECLARE @s1 varchar(8000) = 'bcdabc', @s2 varchar(8000) = 'abcd';

SELECT
  s.s1, -- test to make sure s.s1 is the shorter of the two strings
  position  = t.N,
  tokenSize = x.N,
  string    = substring(s.s1, t.N, x.N)
FROM dbo.getshortstring8k(@s1, @s2) s --<< get the shorter string
CROSS JOIN dbo.tally t  
CROSS JOIN dbo.tally x
WHERE t.N between 1 and len(s.s1)
AND   x.N between 1 and len(s.s1)
AND   len(s.s1) - t.N - (x.N-1) >= 0
AND   charindex(substring(s.s1, t.N, x.N), s.s2) > 0;

5. Retrieving ONLY the longest common substring(s)

This is the easy part. We simply Add TOP (1) WITH TIES to our SELECT statement and we're all set. Here, the longest common substring is "bc" and "xx"

DECLARE @s1 varchar(8000) = 'xxabcxx', @s2 varchar(8000) = 'bcdxx';

SELECT TOP (1) WITH TIES 
  position  = t.N,
  tokenSize = x.N,
  string    = substring(s.s1, t.N, x.N)
FROM dbo.getshortstring8k(@s1, @s2) s
CROSS JOIN dbo.tally t  
CROSS JOIN dbo.tally x
WHERE t.N between 1 and len(s.s1)
AND   x.N between 1 and len(s.s1)
AND   len(s.s1) - t.N - (x.N-1) >= 0
AND   charindex(substring(s.s1, t.N, x.N), s.s2) > 0
ORDER BY x.N DESC;

6. Applying this logic to your table

Using APPLY we replace my variables @s1 and @s2 with the t.str1 & t.str2. I add a filter to exclude matches that contain spaces (see my comments)... And we're off:

-- easily consumbable sample data
DECLARE @yourtable TABLE (str1 varchar(8000), str2 varchar(8000));
INSERT @yourtable 
VALUES ('aabcdfbas','rikcdfva'),('aaab akuc','aaabir a'),('ab akuc','ab atr');

SELECT str1, str2,  [max substring without spaces] = string
FROM @yourtable t
CROSS APPLY 
(
    SELECT TOP (1) WITH TIES 
      position  = t.N,
      tokenSize = x.N,
      string    = substring(s.s1, t.N, x.N)
    FROM dbo.getshortstring8k(t.str1, t.str2) s -- @s1 & @s2 replaced with str1 & str2 
    CROSS JOIN dbo.tally t  
    CROSS JOIN dbo.tally x
    WHERE t.N between 1 and len(s.s1)
    AND   x.N between 1 and len(s.s1)
    AND   len(s.s1) - t.N - (x.N-1) >= 0
    AND   charindex(substring(s.s1, t.N, x.N), s.s2) > 0
    AND   charindex(' ',substring(s.s1, t.N, x.N)) = 0 -- exclude substrings with spaces
    ORDER BY x.N DESC
) lcss;

Results:

str1        str2      max substring without spaces
----------- --------- ------------------------------
aabcdfbas   rikcdfva  cdf
aaab akuc   aaabir a  aaab
ab akuc     ab atr    ab

And the execution plan:

... No sorts or unnecessary operations. Just speed. For longer strings (e.g. 50 characters+) I have an even faster technique you can read about here.

這篇關于SQL中兩個字符串的公共子串的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網!

【網站聲明】本站部分內容來源于互聯網,旨在幫助大家更快的解決問題,如果有圖片或者內容侵犯了您的權益,請聯系我們刪除處理,感謝您的支持!

相關文檔推薦

Converting Every Child Tags in to a Single Column with multiple Delimiters -SQL Server (3)(將每個子標記轉換為具有多個分隔符的單列-SQL Server (3))
How can I create a view from more than one table?(如何從多個表創建視圖?)
Create calculated value based on calculated value inside previous row(根據前一行內的計算值創建計算值)
How do I stack the first two columns of a table into a single column, but also pair third column with the first column only?(如何將表格的前兩列堆疊成一列,但也僅將第三列與第一列配對?) - IT屋-程序員軟件開發技
Recursive t-sql query(遞歸 t-sql 查詢)
Convert Month Name to Date / Month Number (Combinations of Questions amp; Answers)(將月份名稱轉換為日期/月份編號(問題和答案的組合))
主站蜘蛛池模板: 激情毛片 | 国产欧美日韩一区二区三区在线观看 | 精品久久九九 | 国产精品一区二区av | 一级a性色生活片久久毛片 一级特黄a大片 | 伊人av在线播放 | 免费视频一区二区 | 在线观看久草 | 亚洲欧美日韩精品久久亚洲区 | 超碰在线亚洲 | 人人爽人人草 | 国产精品久久久久久久久久久久久 | 欧美精品一区三区 | 精品国产乱码久久久久久闺蜜 | www久久爱| 亚洲欧美中文日韩在线v日本 | 日韩在线视频免费观看 | 中文字幕亚洲精品 | 91精品久久久久久久久 | 高清国产午夜精品久久久久久 | 亚洲少妇综合网 | 91热在线| 二区三区视频 | 亚洲电影一级片 | 国产精品一区视频 | 国产黄色在线观看 | 在线激情视频 | 亚洲国产免费 | 91亚洲视频在线 | 久久精品国产99国产精品 | 别c我啊嗯国产av一毛片 | 亚洲v日韩v综合v精品v | 久久久蜜臀国产一区二区 | 91九色在线观看 | 日韩乱码在线 | 九九色综合| 亚洲一二三区精品 | 国产中的精品av涩差av | 国产一区二区电影网 | 国产免费人成xvideos视频 | 青青草精品|