問題描述
以包含以下 3 個字段的 SQL 表為例:
Take a SQL table with the following 3 fields:
Id,TimeStamp,Item,UserId
我想確定會話中 UserId
最常見的 Item
序列.會話將簡單地由時間閾值定義(即,如果 X 分鐘內沒有完整內容,則未來的任何條目都將被分組到一個新會話中).
I would like to determine the most common sequences of Item
for a UserId
in a session. A session would simply be defined by a threshold of time (i.e. if there are no entires for X minutes, any future entries would be grouped into a new session).
理想情況下,項目序列可以有一種模糊分組,其中序列中的一個或兩個差異仍然可以被視為相同并組合在一起.
Ideally, the sequence of Items could have a sort of fuzzy grouping where one or two differences in the sequence could still be counted as the same and grouped together.
有人知道我如何在 SQL 中解決這個問題嗎?
Anyone know how I might tackle this problem in SQL?
更新:
為了澄清,讓我們假設 Items 是雜貨店島.我有一個月的人去雜貨店.基本問題是人們使用什么島以及它的順序是什么.他們最常去的是1,2,3
還是1,2,1,3,4
?
(現在我很好奇用戶在我們網站上的路徑,但你知道,雜貨店更直觀).
(Right now I am curious about paths of users on our sites, but you know, grocery store is more visual).
更新 2:
這是一個簡單的案例:
Update 2:
Here is a simple case:
CREATE Table #StoreActivity
(
id int,
CreationDate datetime ,
Isle int,
UserId int
)
Insert INTO #StoreActivity
Values
(1, CAST('12-1-2011 03:10:01' AS Datetime), 1, 2222),
(2, CAST('12-1-2011 03:10:07' AS Datetime), 1, 1111),
(3, CAST('12-1-2011 03:10:12' AS Datetime), 2, 2222),
(4, CAST('12-1-2011 04:10:01' AS Datetime), 1, 2222),
(5, CAST('12-1-2011 04:10:23' AS Datetime), 2, 2222)
Select * from #StoreActivity
DROP Table #StoreActivity
/* So with the above data, we have 2 sequences if we declare a session or visit dead if there is no activity for a minute : `1,2` (With a count of 2), and `1` (with a count of 1)*/
推薦答案
WITH q AS
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY UserId ORDER BY TimeStamp, Id) AS rn,
ROW_NUMBER() OVER (PARTITION BY UserId, Item ORDER BY TimeStamp, Id) AS rnd
FROM mytable
)
SELECT *,
rnd - rn AS sequence
FROM q
sequence
列將在給定 UserId
的序列中的所有記錄之間共享.您可以對其進行分組或做任何您喜歡的事情.
The sequence
column will be shared among all records in a sequence for a given UserId
. You can group on it or do whatever you like.
這篇關于TSQL 時間序列模式數據挖掘的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網!