問題描述
我正在處理一個包含以下數據的表:
ObjectId EventId EventDate1 342 2017-10-271 342 2018-01-061 343 2018-04-181 401 2018-10-151 342 2018-11-121 342 2018-11-291 401 2018-12-101 342 2019-02-211 343 2019-04-231 401 2019-11-041 343 2020-02-152 342 2018-06-082 343 2018-09-182 342 2018-10-02
我需要標記對象(由 ObjectId 標識)發生了所有 3 個事件(由 EventId 值 342、343 和 401 標識)的第一條記錄.然后,該過程應以剩余的記錄重新開始.我試過使用窗口函數來讓它工作,但是重新開始"識別任何其他事件的過程讓我感到困惑.
該算法在上述數據集上執行的輸出為:
ObjectId EventId EventDate EventsComplete1 342 2017-10-27 01 342 2018-01-06 01 343 2018-04-18 01 401 2018-10-15 11 342 2018-11-12 01 342 2018-11-29 01 401 2018-12-10 01 342 2019-02-21 01 343 2019-04-23 11 401 2019-11-04 01 343 2020-02-15 02 342 2018-06-08 02 343 2018-09-18 02 342 2018-10-02 0
這是將在示例中創建數據集的查詢.
select 1 as ObjectId, 342 as EventId, cast('2017-10-27' as date) as EventDateunion select 1 as ObjectId, 342 as EventId, cast('2018-01-06' as date) as EventDateunion select 1 as ObjectId, 343 as EventId, cast('2018-04-18' as date) as EventDateunion select 1 as ObjectId, 401 as EventId, cast('2018-10-15' as date) as EventDateunion select 1 as ObjectId, 342 as EventId, cast('2018-11-12' as date) as EventDateunion select 1 as ObjectId, 342 as EventId, cast('2018-11-29' as date) as EventDateunion select 1 as ObjectId, 401 as EventId, cast('2018-12-10' as date) as EventDateunion select 1 as ObjectId, 342 as EventId, cast('2019-02-21' as date) as EventDateunion select 1 as ObjectId, 343 as EventId, cast('2019-04-23' as date) as EventDateunion select 1 as ObjectId, 401 as EventId, cast('2019-11-04' as date) as EventDateunion select 1 as ObjectId, 343 as EventId, cast('2020-02-15' as date) as EventDateunion select 2 as ObjectId, 342 as EventId, cast('2018-06-08' as date) as EventDateunion select 2 as ObjectId, 343 as EventId, cast('2018-09-18' as date) as EventDateunion select 2 as ObjectId, 342 as EventId, cast('2018-10-02' as date) as EventDate
以下基于 Set 的解決方案.
除使用位域外,未嘗試任何優化過程.它有效,這對我來說已經足夠了.我可以看到一些可能的簡化點
我應該補充一點,真的,這個問題目前是未定義的,因為如果兩個不同的事件可以在同一日期發生,那么我們應該將它們處理的順序沒有定義發生.因此,在這些情況下,第一個 CTE 中分配的行號是任意的.樣本數據中未出現此類情況.
使用字符串連接路徑 - 150 毫秒.
切換到位而不是字符串,仍然比光標(~15 ms)慢(~30 ms)
select 1 as ObjectId, 342 as EventId, cast('2017-10-27' as date) as EventDate成噸union all select 1, 342, cast('2018-01-06' as date)union all select 1, 343, cast('2018-04-18' as date)union all select 1, 401, cast('2018-10-15' as date)union all select 1, 342, cast('2018-11-12' as date)union all select 1, 342, cast('2018-11-29' as date)union all select 1, 401, cast('2018-12-10' as date)union all select 1, 342, cast('2019-02-21' as date)union all select 1, 343, cast('2019-04-23' as date)union all select 1, 401, cast('2019-11-04' as date)union all select 1, 343, cast('2020-02-15' as date)union all select 2, 342, cast('2018-06-08' as date)union all select 2, 343, cast('2018-09-18' as date)union all select 2, 342, cast('2018-10-02' as date);走編號為-- 只需添加一個行號以使其更易于遵循(選擇objectid,事件,活動日期,rn = row_number() over (partition by objectid order by eventdate asc),bits = cast(power(2, case eventid when 342 then 0 when 343 then 1 else 2 end) as tinyint)從T),路徑為-- 每一行的不同 eventid 的連接路徑,作為一個位域(選擇 n.objectid,n.eventid,n.事件日期,根 = n.rn,名詞,位從編號 n聯合所有選擇 n.objectid,n.eventid,n.事件日期,p.root,名詞,p.bits |n.bits從路徑 p在 n.objectid = p.objectid 上加入編號為 n和 n.rn >p.n和 p.bits &n.bits = 0),候選人作為-- 具有包含所有 3 個值的路徑的行(位 = 7)(選擇 *從 (選擇根,嗯,候選人 = iif(rn = min(rn) over(按根分區),1, 0)從路徑其中位 = 7) C其中 c.candidate = 1)-- 按行號順序獲取沒有較早候選者的候選行-- 有一個根到尾的路徑,該路徑與該候選路徑重疊選擇不同的n.objectid,n.eventid,n.事件日期,isnull(c.candidate, 0)從編號 n左加入候選人 c on c.rn = n.rn并且不存在(選擇 *從候選人上一頁其中 prev.rn
lulz 的純游標.
<預><代碼>聲明@triplets 表(objectid int, eventid int, eventdate date);聲明 c 游標 fast_forward for選擇 objectid, eventid, eventdate from t order by objectid, eventdate asc;宣布@ob int、@prevob int、@event int、@dt 日期、@bits tinyint = 0;打開 c;從 c 中取 next 到 @ob, @event, @dt;而@@fetch_status = 0開始如果(@ob = @prevob)開始如果@event = 342 設置@bits |= 1;否則如果@event = 343 設置@bits |= 2;否則如果@event = 401 設置@bits |= 4;如果(@bits = 7)開始插入@triplets 值(@ob、@event、@dt);設置@bits = 0結尾結尾否則選擇@bits = 0,@prevob = @ob;從 c 中取 next 到 @ob, @event, @dt;結尾關閉 c;解除分配c;選擇 t.*, iif(tt.objectid 為 null, 0, 1)從T在 t.objectid = tt.objectid 上左加入 @triplets tt和 t.eventid = tt.eventid和 t.eventdate = tt.eventdate;I'm working with a table that contains the following data:
ObjectId EventId EventDate
1 342 2017-10-27
1 342 2018-01-06
1 343 2018-04-18
1 401 2018-10-15
1 342 2018-11-12
1 342 2018-11-29
1 401 2018-12-10
1 342 2019-02-21
1 343 2019-04-23
1 401 2019-11-04
1 343 2020-02-15
2 342 2018-06-08
2 343 2018-09-18
2 342 2018-10-02
I need to flag the first record where all 3 events (identified by EventId values 342, 343, and 401) have occurred for an object (identified by ObjectId). Then, the process should start again with the remaining records. I've tried using windowed functions to get this to work, but the "starting over" process of identifying any additional occurrences is tripping me up.
The output of this algorithm performed on the above data set is:
ObjectId EventId EventDate EventsComplete
1 342 2017-10-27 0
1 342 2018-01-06 0
1 343 2018-04-18 0
1 401 2018-10-15 1
1 342 2018-11-12 0
1 342 2018-11-29 0
1 401 2018-12-10 0
1 342 2019-02-21 0
1 343 2019-04-23 1
1 401 2019-11-04 0
1 343 2020-02-15 0
2 342 2018-06-08 0
2 343 2018-09-18 0
2 342 2018-10-02 0
Here's a query that will create the data set in the example.
select 1 as ObjectId, 342 as EventId, cast('2017-10-27' as date) as EventDate
union select 1 as ObjectId, 342 as EventId, cast('2018-01-06' as date) as EventDate
union select 1 as ObjectId, 343 as EventId, cast('2018-04-18' as date) as EventDate
union select 1 as ObjectId, 401 as EventId, cast('2018-10-15' as date) as EventDate
union select 1 as ObjectId, 342 as EventId, cast('2018-11-12' as date) as EventDate
union select 1 as ObjectId, 342 as EventId, cast('2018-11-29' as date) as EventDate
union select 1 as ObjectId, 401 as EventId, cast('2018-12-10' as date) as EventDate
union select 1 as ObjectId, 342 as EventId, cast('2019-02-21' as date) as EventDate
union select 1 as ObjectId, 343 as EventId, cast('2019-04-23' as date) as EventDate
union select 1 as ObjectId, 401 as EventId, cast('2019-11-04' as date) as EventDate
union select 1 as ObjectId, 343 as EventId, cast('2020-02-15' as date) as EventDate
union select 2 as ObjectId, 342 as EventId, cast('2018-06-08' as date) as EventDate
union select 2 as ObjectId, 343 as EventId, cast('2018-09-18' as date) as EventDate
union select 2 as ObjectId, 342 as EventId, cast('2018-10-02' as date) as EventDate
Set based solution below.
No optimisation passes have been attempted other than using a bitfield. It works, that's enough for me. I can see a few points of possible simplification
I should add that, really, this problem is currently undefined, because if two different events can occur on the same date, there is no definition for the order in which we should treat them to have occured. So the row number allocated in the first CTE is arbitrary in those cases. No such cases occur in the sample data.
Using string concatenated paths - 150 ms.
Switching to bits instead of strings, still slower (~30 ms) than the cursor (~15 ms)
select 1 as ObjectId, 342 as EventId, cast('2017-10-27' as date) as EventDate
into t
union all select 1, 342, cast('2018-01-06' as date)
union all select 1, 343, cast('2018-04-18' as date)
union all select 1, 401, cast('2018-10-15' as date)
union all select 1, 342, cast('2018-11-12' as date)
union all select 1, 342, cast('2018-11-29' as date)
union all select 1, 401, cast('2018-12-10' as date)
union all select 1, 342, cast('2019-02-21' as date)
union all select 1, 343, cast('2019-04-23' as date)
union all select 1, 401, cast('2019-11-04' as date)
union all select 1, 343, cast('2020-02-15' as date)
union all select 2, 342, cast('2018-06-08' as date)
union all select 2, 343, cast('2018-09-18' as date)
union all select 2, 342, cast('2018-10-02' as date);
go
with numbered as
-- just adding a row number to make it easier to follow
(
select objectid,
eventid,
eventdate,
rn = row_number() over (partition by objectid order by eventdate asc),
bits = cast(power(2, case eventid when 342 then 0 when 343 then 1 else 2 end) as tinyint)
from t
),
paths as
-- the concatenated paths of distinct eventid for each row, as a bitfield
(
select n.objectid,
n.eventid,
n.eventdate,
root = n.rn,
n.rn,
bits
from numbered n
union all
select n.objectid,
n.eventid,
n.eventdate,
p.root,
n.rn,
p.bits | n.bits
from paths p
join numbered n on n.objectid = p.objectid
and n.rn > p.rn
and p.bits & n.bits = 0
),
candidates as
-- a row that has a path containing all 3 values (bits = 7)
(
select *
from (
select root,
rn,
candidate = iif
(
rn = min(rn) over (partition by root),
1, 0
)
from paths
where bits = 7
) c
where c.candidate = 1
)
-- get the candidate rows where no earlier candidiate in row number order
-- has a root-to-end path which overlaps the path for this candidate
select distinct
n.objectid,
n.eventid,
n.eventdate,
isnull(c.candidate, 0)
from numbered n
left join candidates c on c.rn = n.rn
and not exists
(
select *
from candidates prev
where prev.rn < c.rn
and prev.rn > c.root
and prev.root < c.rn
)
order by n.objectid,
n.eventdate,
n.eventid
Pure cursor for the lulz.
declare @triplets table(objectid int, eventid int, eventdate date);
declare c cursor fast_forward for
select objectid, eventid, eventdate from t order by objectid, eventdate asc;
declare
@ob int, @prevob int, @event int, @dt date,
@bits tinyint = 0;
open c;
fetch next from c into @ob, @event, @dt;
while @@fetch_status = 0
begin
if (@ob = @prevob)
begin
if @event = 342 set @bits |= 1;
else if @event = 343 set @bits |= 2;
else if @event = 401 set @bits |= 4;
if (@bits = 7)
begin
insert @triplets values (@ob, @event, @dt);
set @bits = 0
end
end
else select @bits = 0, @prevob = @ob;
fetch next from c into @ob, @event, @dt;
end
close c;
deallocate c;
select t.*, iif(tt.objectid is null, 0, 1)
from t
left join @triplets tt on t.objectid = tt.objectid
and t.eventid = tt.eventid
and t.eventdate = tt.eventdate;
這篇關于如何識別表中哪些行滿足特定條件,但條件是基于前一行的數據?提供的示例的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網!