免费的日本网站,精品亚洲自拍,日韩久久久精品

本文介紹了SQL中兩個字符串的公共子串的處理方法，對大家解決問題具有一定的參考價值，需要的朋友們下面隨著小編來一起學習吧！

問題描述

我需要在 SQL 中找到兩個字符串的公共子字符串(沒有空格).

查詢:

選擇 *從 tbl 為 a，tbl 為 b其中 a.str <>b.str

示例數據:

str1 |str2 |不帶空格的最大子串----------+-----------+---------------------------——aabcdfbas |里克迪瓦 |發展基金aaab akuc |aaabir |aaabab akuc |ab atr |AB

解決方案

我不同意那些說 SQL 不是這項工作的工具的人.在有人向我展示比我的任何編程語言的解決方案更快的方法之前，我會斷言 SQL(以基于集合的方式編寫，無副作用，僅使用不可變變量)是唯一em> 用于此工作的工具(處理 varchar(8000)- 或 nvarchar(4000) 時).下面的解決方案是針對 varchar(8000).

1.正確索引的計數(數字)表.

-- (1) 構建并填充一個持久化的(數字)計數如果 OBJECT_ID('dbo.tally') 不是 NULL DROP TABLE dbo.tally;CREATE TABLE dbo.tally (n int not null);WITH DummyRows(V) AS(SELECT 1 FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1))) t(N))插入數據庫SELECT TOP (8000) ROW_NUMBER() OVER (ORDER BY (SELECT 1))FROM DummyRows a CROSS JOIN DummyRows b CROSS JOIN DummyRows c CROSS JOIN DummyRows d;-- (2) 為性能添加必需的約束(和索引)ALTER TABLE dbo.tally添加約束 pk_tally PRIMARY KEY CLUSTERED(N) WITH FILLFACTOR = 100;ALTER TABLE dbo.tally添加約束 uq_tally UNIQUE NONCLUSTERED(N);

請注意，計數表功能的性能將不佳.

2.使用我們的計數表將所有可能的子串返回一個字符串

以abcd"為例，讓我們獲取它的所有子字符串.注意我的評論.

聲明@s1 varchar(8000) = 'abcd';選擇位置 = t.N,tokenSize = x.N,string = substring(@s1, t.N, x.N)FROM dbo.tally t -- 令牌位置CROSS JOIN dbo.tally x -- 令牌長度WHERE t.N <= len(@s1) -- 所有位置AND x.N <= len(@s1) -- 所有長度AND len(@s1) - t.N - (x.N-1) >= 0 -- 過濾不必要的行 [e.g.substring('abcd',3,2)]

position tokenSize 字符串----------- ----------- -------1 1 一個2 1 乙3 1 c4 1 天1 2 抗體2 2 公元前3 2 cd1 3 美國廣播公司2 3 bcd1 4 abcd

3.dbo.getshortstring8K

這個功能是什么?第一個重大優化.我們將把兩個字符串中較短的字符串分成每個可能的子字符串，然后查看它是否存在于較長的字符串中.如果您有兩個字符串(S1 和 S2)并且 S1 比 S2 長，我們知道 S1 的任何一個比 S2 長的子字符串都不是 S2 的子字符串.這就是 dbo.getshortstring 的目的:確保我們不執行任何不必要的子字符串比較.這會更有意義.

這非常重要，因為可以使用

...沒有排序或不必要的操作.速度而已.對于更長的字符串(例如 50 個字符以上)，我有一個更快的技術，您可以閱讀關于這里.

I need to find common substring (without space) of two strings in SQL.

Query:
select * from tbl as a, tbl as b where a.str <> b.str
Sample data:
str1 | str2 | max substring without spaces ----------+-----------+----------------------------- aabcdfbas | rikcdfva | cdf aaab akuc | aaabir a | aaab ab akuc | ab atr | ab

解決方案
I disagree with those who say that SQL is not the tool for this job. Until someone can show me a faster way than my solution in ANY programming language, I will aver that SQL (written in a set-based, side-effect free, using only immutable variables) is the ONLY tool for this job (when dealing with varchar(8000)- or nvarchar(4000)). The solution below is for varchar(8000).

1. A correctly indexed tally (numbers) table.
-- (1) build and populate a persisted (numbers) tally IF OBJECT_ID('dbo.tally') IS NOT NULL DROP TABLE dbo.tally; CREATE TABLE dbo.tally (n int not null); WITH DummyRows(V) AS(SELECT 1 FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) t(N)) INSERT dbo.tally SELECT TOP (8000) ROW_NUMBER() OVER (ORDER BY (SELECT 1)) FROM DummyRows a CROSS JOIN DummyRows b CROSS JOIN DummyRows c CROSS JOIN DummyRows d; -- (2) Add Required constraints (and indexes) for performance ALTER TABLE dbo.tally ADD CONSTRAINT pk_tally PRIMARY KEY CLUSTERED(N) WITH FILLFACTOR = 100; ALTER TABLE dbo.tally ADD CONSTRAINT uq_tally UNIQUE NONCLUSTERED(N);
Note that a tally table function will not perform as well.

2. Using our tally table to return all possible substrings a string

Using "abcd" as an example, let's get all of it's substrings. Note my comments.
DECLARE @s1 varchar(8000) = 'abcd'; SELECT position = t.N, tokenSize = x.N, string = substring(@s1, t.N, x.N) FROM dbo.tally t -- token position CROSS JOIN dbo.tally x -- token length WHERE t.N <= len(@s1) -- all positions AND x.N <= len(@s1) -- all lengths AND len(@s1) - t.N - (x.N-1) >= 0 -- filter unessesary rows [e.g.substring('abcd',3,2)]
This returns
position tokenSize string ----------- ----------- ------- 1 1 a 2 1 b 3 1 c 4 1 d 1 2 ab 2 2 bc 3 2 cd 1 3 abc 2 3 bcd 1 4 abcd
3. dbo.getshortstring8K

What's this function about? The first major optimization. We're going to break the shorter of the two strings into every possible substring then see if it exists in the longer string. If you have two strings (S1 and S2) and S1 is longer than S2, we know that none of the substrings of S1, that are longer than S2, will be a substring of S2. That's the purpose of dbo.getshortstring: to ensure that we don't perform any unnecessary substring comparisons. This will make more sense in a moment.

This is hugely important because, the number of substrings in a string can be calculated using a Triangle Number Function. With N as the length (number of characters) in a string, the number of substrings can be calculated as N*(N+1)/2. E.g. "abc" has 6 substrings: 3*(3+1)/2 = 6; a,b,c,ab,bc,abc. If we're comparing "abc" to "abcdefgh" we don't need to check if "abcd" is a substring of "abc".

Breaking "abcdefgh" (length=8) into all possible substrings requires 8*(8+1)/2 = 36 operations (vs 6 for "abc").
IF OBJECT_ID('dbo.getshortstring8k') IS NOT NULL DROP FUNCTION dbo.getshortstring8k; GO CREATE FUNCTION dbo.getshortstring8k(@s1 varchar(8000), @s2 varchar(8000)) RETURNS TABLE WITH SCHEMABINDING AS RETURN SELECT s1 = CASE WHEN LEN(@s1) < LEN(@s2) THEN @s1 ELSE @s2 END, s2 = CASE WHEN LEN(@s1) < LEN(@s2) THEN @s2 ELSE @s1 END;
4. Finding all subsrings of the shorter string that exist in the longer string:
DECLARE @s1 varchar(8000) = 'bcdabc', @s2 varchar(8000) = 'abcd'; SELECT s.s1, -- test to make sure s.s1 is the shorter of the two strings position = t.N, tokenSize = x.N, string = substring(s.s1, t.N, x.N) FROM dbo.getshortstring8k(@s1, @s2) s --<< get the shorter string CROSS JOIN dbo.tally t CROSS JOIN dbo.tally x WHERE t.N between 1 and len(s.s1) AND x.N between 1 and len(s.s1) AND len(s.s1) - t.N - (x.N-1) >= 0 AND charindex(substring(s.s1, t.N, x.N), s.s2) > 0;
5. Retrieving ONLY the longest common substring(s)

This is the easy part. We simply Add TOP (1) WITH TIES to our SELECT statement and we're all set. Here, the longest common substring is "bc" and "xx"
DECLARE @s1 varchar(8000) = 'xxabcxx', @s2 varchar(8000) = 'bcdxx'; SELECT TOP (1) WITH TIES position = t.N, tokenSize = x.N, string = substring(s.s1, t.N, x.N) FROM dbo.getshortstring8k(@s1, @s2) s CROSS JOIN dbo.tally t CROSS JOIN dbo.tally x WHERE t.N between 1 and len(s.s1) AND x.N between 1 and len(s.s1) AND len(s.s1) - t.N - (x.N-1) >= 0 AND charindex(substring(s.s1, t.N, x.N), s.s2) > 0 ORDER BY x.N DESC;
6. Applying this logic to your table

Using APPLY we replace my variables @s1 and @s2 with the t.str1 & t.str2. I add a filter to exclude matches that contain spaces (see my comments)... And we're off:
-- easily consumbable sample data DECLARE @yourtable TABLE (str1 varchar(8000), str2 varchar(8000)); INSERT @yourtable VALUES ('aabcdfbas','rikcdfva'),('aaab akuc','aaabir a'),('ab akuc','ab atr'); SELECT str1, str2, [max substring without spaces] = string FROM @yourtable t CROSS APPLY ( SELECT TOP (1) WITH TIES position = t.N, tokenSize = x.N, string = substring(s.s1, t.N, x.N) FROM dbo.getshortstring8k(t.str1, t.str2) s -- @s1 & @s2 replaced with str1 & str2 CROSS JOIN dbo.tally t CROSS JOIN dbo.tally x WHERE t.N between 1 and len(s.s1) AND x.N between 1 and len(s.s1) AND len(s.s1) - t.N - (x.N-1) >= 0 AND charindex(substring(s.s1, t.N, x.N), s.s2) > 0 AND charindex(' ',substring(s.s1, t.N, x.N)) = 0 -- exclude substrings with spaces ORDER BY x.N DESC ) lcss;
Results:
str1 str2 max substring without spaces ----------- --------- ------------------------------ aabcdfbas rikcdfva cdf aaab akuc aaabir a aaab ab akuc ab atr ab
And the execution plan:

... No sorts or unnecessary operations. Just speed. For longer strings (e.g. 50 characters+) I have an even faster technique you can read about here.

這篇關于SQL中兩個字符串的公共子串的文章就介紹到這了，希望我們推薦的答案對大家有所幫助，也希望大家多多支持html5模板網！

【網站聲明】本站部分內容來源于互聯網,旨在幫助大家更快的解決問題，如果有圖片或者內容侵犯了您的權益，請聯系我們刪除處理，感謝您的支持！

久久久久久久av_日韩在线中文_看一级毛片视频_日本精品二区_成人深夜福利视频_武道仙尊动漫在线观看

SQL中兩個字符串的公共子串

問題描述

相關文檔推薦