問題描述
我在我的表中發(fā)現(xiàn)了需要刪除的重復項.該表包含:
I found duplicates in my table which needed to be removed. The table contains:
- ID - 表的唯一鍵
- STUDENT_ID - 學生的 ID
- SUBJECT_ID - 學生的科目
- CLASS_ID - 班級學生在
- XP_LVL - 專業(yè)水平
一個學生應該只有一個科目、班級和 XP_lvl 的記錄.在這種情況下,刪除重復項是基于刪除所有但保留一個.
One student should have only one record of subject, class and XP_lvl. In this case the removal of duplicities is based on delete all but keep one.
在我的情況下,重復看起來像這樣:
In my case duplicates looks like this:
ID | STUDENT_ID | SUBJECT_ID | CLASS_ID | EXPERTISE_LVL |
---|---|---|---|---|
1 | 1AAA | 55FFE | CLASS808 | 2 |
2 | 1AAA | 55FFE | CLASS808 | 2 |
3 | 2AAB | 49BB | CLASS890 | 3 |
4 | 2AAB | 49BB | CLASS890 | 3 |
5 | 2AAB | 49BB | CLASS890 | 4 |
6 | 2AAB | 49BB | CLASS890 | 3 |
我通過創(chuàng)建 (CONCAT(STUDENT_ID, CONCAT(SUBJECT_ID, CLASS_ID)))
的唯一 ID,然后通過 count.. >1
識別出所有重復項> 工作正常.
I have identified all the duplicates by creating unique ID of (CONCAT(STUDENT_ID, CONCAT(SUBJECT_ID, CLASS_ID)))
and then by having count.. >1
which works fine.
現(xiàn)在我需要識別所有 ID,以便我可以在從查詢中刪除時使用 ID NOT IN (SELECT...)
.
Now I need to identify all the ID so I can use ID NOT IN (SELECT...)
in my delete from query.
所以我這樣做了..
AND ID NOT IN (SELECT UID FROM (
SELECT
min(ID) AS UID,
STUDENT_ID,
SUBJECT_ID,
CLASS_ID
FROM
my_table
GROUP BY
STUDENT_ID,
SUBJECT_ID,
CLASS_ID
HAVING
count(CONCAT(STUDENT_ID, CONCAT(SUBJECT_ID, CLASS_ID))) > 1))
但是,我不能使用 min/max(ID)
來選擇要保留的 ID,因為正如您所見,對于學生 2AAB,存在具有不同 XP_LVL 的重復項.
However I cannot use min/max(ID)
to choose which ID to keep because as you can see for student 2AAB there are duplicities with different XP_LVL.
在這種情況下,我需要選擇最高 XP_LVL 的 ID 來保留和刪除所有其他的.
In this case I need to select ID of highest XP_LVL to keep and delete all other.
我嘗試使用 RANK、ROWNUM 不同的排序和子選擇的負載,但沒有想要的結(jié)果.
I tried using RANK, ROWNUM different ordering and loaaads of subselects but without desired results.
有誰知道如何有效地做到這一點?我們正在談論 6k 重復,所以我不能一一做.提前感謝您的幫助.
Does anyone have any idea how to do it effectively ? We are talking about 6k duplicities so I cant do it one by one. Thank you for your help in advance.
推薦答案
您可以使用 ROW_NUMBER
解析函數(shù)并使用 ROWID
偽列關(guān)聯(lián)刪除:>
You can use ROW_NUMBER
analytic function and correlate the deletion using the ROWID
pseudo-column:
DELETE FROM my_table
WHERE ROWID IN (
SELECT ROWID
FROM (
SELECT ROW_NUMBER() OVER (
PARTITION BY student_id, subject_id, class_id
ORDER BY expertise_lvl DESC
) AS rn
FROM my_table
)
WHERE rn > 1
)
db<>fiddle 這里
這篇關(guān)于SQL - 使用 ID NOT IN 的高級重復刪除的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網(wǎng)!