問題描述
這里是我所有表的結構和查詢(請關注最后一個查詢,附在下面).正如您在小提琴中看到的,這是當前輸出:
+---------+-----------+-------+------------+--------------+|用戶 ID |用戶名 |得分 |聲譽|top_two_tags |+---------+-----------+-------+------------+--------------+|1 |杰克 |0 |18 |css,mysql ||4 |詹姆斯 |1 |5 |html ||2 |彼得 |0 |0 |空||3 |阿里 |0 |0 |空|+---------+-----------+-------+------------+--------------+
這是正確的,一切都很好.
<小時>現在我又多了一個名為category"的存在.每個帖子只能有一個類別.而且我還想為每個用戶獲得前兩個類別.這里是我的新查詢.正如您在結果中看到的,發生了一些重復:
+---------+-----------+-------+------------+--------------+------------------------+|用戶 ID |用戶名 |得分 |聲譽|top_two_tags |top_two_categories |+---------+-----------+-------+------------+--------------+------------------------+|1 |杰克 |0 |18 |css,css |技術,技術||4 |詹姆斯 |1 |5 |html |政治 ||2 |彼得 |0 |0 |空|空||3 |阿里 |0 |0 |空|空|+---------+-----------+-------+------------+--------------+------------------------+
看到了嗎?css,css
,技術,技術
.為什么這些是重復的?我剛剛為 categories
添加了一個 LEFT JOIN
,就像 tags
一樣.但它不能按預期工作,甚至會影響標簽.
無論如何,這是預期的結果:
+---------+-----------+-------+------------+--------------+------------------------+|用戶 ID |用戶名 |得分 |聲譽|top_two_tags |類別 |+---------+-----------+-------+------------+--------------+------------------------+|1 |杰克 |0 |18 |css,mysql |科技、社交 ||4 |詹姆斯 |1 |5 |html |政治 ||2 |彼得 |0 |0 |空|空||3 |阿里 |0 |0 |空|空|+---------+-----------+-------+------------+--------------+------------------------+
有人知道我怎樣才能做到這一點嗎?
<小時>CREATE TABLE users(id integer PRIMARY KEY, user_name varchar(5));CREATE TABLE tags(id integer NOT NULL PRIMARY KEY, tag varchar(5));創建表聲譽(id 整數 PRIMARY KEY,post_id integer/* REFERENCES posts(id) */,user_id integer REFERENCES users(id),分數整數,聲譽整數,日期時間整數);創建表 post_tag(post_id integer/* REFERENCES posts(id) */,tag_id integer REFERENCES tags(id),PRIMARY KEY (post_id, tag_id));創建表類別(id INTEGER NOT NULL PRIMARY KEY,類別varchar(10)NOT NULL);創建表 post_category(post_id INTEGER NOT NULL/* REFERENCES posts(id) */,category_id INTEGER NOT NULL REFERENCES category(id),PRIMARY KEY(post_id, category_id)) ;選擇q1.user_id, q1.user_name, q1.score, q1.reputation,substring_index(group_concat(q2.tag ORDER BY q2.tag_reputation DESC SEPARATOR ','), ',', 2) AS top_two_tags,substring_index(group_concat(q3.category ORDER BY q3.c??ategory_reputation DESC SEPARATOR ','), ',', 2) AS 類別從(選擇u.id AS user_Id,u.user_name,合并(sum(r.score), 0) 作為分數,合并(sum(r.reputation), 0) 作為聲譽從用戶你LEFT JOIN 聲望 rON r.user_id = u.idAND r.date_time >1500584821/* unix_timestamp(DATE_SUB(now(), INTERVAL 1 WEEK)) */通過...分組u.id, u.user_name) 作為 q1左加入(選擇r.user_id AS user_id, t.tag, sum(r.reputation) AS tag_reputation從聲譽加入 post_tag pt ON pt.post_id = r.post_id加入標簽 t ON t.id = pt.tag_id在哪里r.date_time >1500584821/* unix_timestamp(DATE_SUB(now(), INTERVAL 1 WEEK)) */通過...分組user_id, t.tag) 作為 q2ON q2.user_id = q1.user_id左加入(選擇r.user_id AS user_id, c.category, sum(r.reputation) AS category_reputation從聲譽加入 post_category ct ON ct.post_id = r.post_id加入類別 c ON c.id = ct.category_id在哪里r.date_time >1500584821/* unix_timestamp(DATE_SUB(now(), INTERVAL 1 WEEK)) */通過...分組user_id, c.category) 作為 q3ON q3.user_id = q1.user_id通過...分組q1.user_id, q1.user_name, q1.score, q1.reputation訂購者q1.reputation DESC, q1.score DESC ;
您的第二個查詢格式為:
q1 -- PK user_id左加入 (...GROUP BY user_id, t.tag) 作為 q2ON q2.user_id = q1.user_id左加入 (...GROUP BY user_id, c.category) 作為 q3ON q3.user_id = q1.user_idGROUP BY -- group_concats
內部 GROUP BY 結果為 (user_id, t.tag)
&(user_id, c.category)
是鍵/唯一.除此之外,我不會解決那些 GROUP BY.
TL;DR 當您加入 (q1 JOIN q2) 到 q3 時,它不在其中一個鍵/唯一鍵上,因此對于每個 user_id,您會為每個可能的標簽組合獲得一行;類別.所以最終的 GROUP BY 輸入重復每個 (user_id, tag) &per (user_id, category) 和不當的 GROUP_CONCATs 重復標簽 &每個 user_id 的類別.正確的應該是 (q1 JOIN q2 GROUP BY) JOIN (q1 JOIN q3 GROUP BY),其中所有連接都在公共鍵/UNIQUE (user_id)
上沒有虛假聚合.盡管有時您可以撤消這種虛假聚合.
正確對稱的 INNER JOIN 方法:LEFT JOIN q1 &q2--1:many--then GROUP BY &GROUP_CONCAT(這是您的第一個查詢所做的);然后分別類似地LEFT JOIN q1 &q3--1:many--then GROUP BY &GROUP_CONCAT;然后 INNER JOIN 這兩個結果 ON user_id--1:1.
正確的對稱標量子查詢方法:從 q1 中選擇 GROUP_CONCAT 作為 標量子查詢,每個都帶有一個 GROUP BY.
正確的累積LEFT JOIN方法:LEFT JOIN q1 &q2--1:many--then GROUP BY &GROUP_CONCAT;然后左加入那個 &q3--1:many--then GROUP BY &GROUP_CONCAT.
像您的第二個查詢一樣的正確方法:您首先 LEFT JOIN q1 &q2--1:很多.然后你離開加入那個 &q3--許多:1:許多.它為每個可能的標簽組合提供一行與 user_id 一起出現的類別.然后在你 GROUP BY 之后你 GROUP_CONCAT - 重復 (user_id, tag) 對和重復 (user_id, category) 對.這就是為什么你有重復的列表元素.但是將 DISTINCT 添加到 GROUP_CONCAT 會給出正確的結果.(根據 wchiquito 的評論.)
與往常一樣,您更喜歡的是一種工程權衡,由查詢計劃和時間,根據實際數據/使用/統計.輸入&預期重復數量的統計數據)、實際查詢的時間等.一個問題是 many:1:many JOIN 方法的額外行是否抵消了它對 GROUP BY 的節省.
-- 累積LEFT JOIN方法選擇q1.user_id, q1.user_name, q1.score, q1.reputation,top_two_tags,substring_index(group_concat(q3.category ORDER BY q3.c??ategory_reputation DESC SEPARATOR ','), ',', 2) AS 類別從-- 您的第一個查詢(更少的 ORDER BY)AS q1(選擇q1.user_id, q1.user_name, q1.score, q1.reputation,substring_index(group_concat(q2.tag ORDER BY q2.tag_reputation DESC SEPARATOR ','), ',', 2) AS top_two_tags從(選擇u.id AS user_Id,u.user_name,合并(sum(r.score), 0) 作為分數,合并(sum(r.reputation), 0) 作為聲譽從用戶你LEFT JOIN 聲望 rON r.user_id = u.idAND r.date_time >1500584821/* unix_timestamp(DATE_SUB(now(), INTERVAL 1 WEEK)) */通過...分組u.id, u.user_name) 作為 q1左加入(選擇r.user_id AS user_id, t.tag, sum(r.reputation) AS tag_reputation從聲譽加入 post_tag pt ON pt.post_id = r.post_id加入標簽 t ON t.id = pt.tag_id在哪里r.date_time >1500584821/* unix_timestamp(DATE_SUB(now(), INTERVAL 1 WEEK)) */通過...分組user_id, t.tag) 作為 q2ON q2.user_id = q1.user_id通過...分組q1.user_id, q1.user_name, q1.score, q1.reputation) 作為 q1- 像您的第二個查詢一樣完成左加入(選擇r.user_id AS user_id, c.category, sum(r.reputation) AS category_reputation從聲譽加入 post_category ct ON ct.post_id = r.post_id加入類別 c ON c.id = ct.category_id在哪里r.date_time >1500584821/* unix_timestamp(DATE_SUB(now(), INTERVAL 1 WEEK)) */通過...分組user_id, c.category) 作為 q3ON q3.user_id = q1.user_id通過...分組q1.user_id, q1.user_name, q1.score, q1.reputation訂購者q1.reputation DESC, q1.score DESC ;
Here is all my tables' structure and the query (please focus on the last query, appended below). As you see in the fiddle, here is the current output:
+---------+-----------+-------+------------+--------------+
| user_id | user_name | score | reputation | top_two_tags |
+---------+-----------+-------+------------+--------------+
| 1 | Jack | 0 | 18 | css,mysql |
| 4 | James | 1 | 5 | html |
| 2 | Peter | 0 | 0 | null |
| 3 | Ali | 0 | 0 | null |
+---------+-----------+-------+------------+--------------+
It's correct and all fine.
Now I have one more existence named "category". Each post can has only one category. And I also want to get top two categories for each user. And here is my new query. As you see in the result, some duplicates happened:
+---------+-----------+-------+------------+--------------+------------------------+
| user_id | user_name | score | reputation | top_two_tags | top_two_categories |
+---------+-----------+-------+------------+--------------+------------------------+
| 1 | Jack | 0 | 18 | css,css | technology,technology |
| 4 | James | 1 | 5 | html | political |
| 2 | Peter | 0 | 0 | null | null |
| 3 | Ali | 0 | 0 | null | null |
+---------+-----------+-------+------------+--------------+------------------------+
See? css,css
, technology, technology
. Why these are duplicate? I've just added one more LEFT JOIN
for categories
, exactly like tags
. But it doesn't work as expected and even affects on the tags either.
Anyway, this is the expected result:
+---------+-----------+-------+------------+--------------+------------------------+
| user_id | user_name | score | reputation | top_two_tags | category |
+---------+-----------+-------+------------+--------------+------------------------+
| 1 | Jack | 0 | 18 | css,mysql | technology,social |
| 4 | James | 1 | 5 | html | political |
| 2 | Peter | 0 | 0 | null | null |
| 3 | Ali | 0 | 0 | null | null |
+---------+-----------+-------+------------+--------------+------------------------+
Does anybody know how can I achieve that?
CREATE TABLE users(id integer PRIMARY KEY, user_name varchar(5));
CREATE TABLE tags(id integer NOT NULL PRIMARY KEY, tag varchar(5));
CREATE TABLE reputations(
id integer PRIMARY KEY,
post_id integer /* REFERENCES posts(id) */,
user_id integer REFERENCES users(id),
score integer,
reputation integer,
date_time integer);
CREATE TABLE post_tag(
post_id integer /* REFERENCES posts(id) */,
tag_id integer REFERENCES tags(id),
PRIMARY KEY (post_id, tag_id));
CREATE TABLE categories(id INTEGER NOT NULL PRIMARY KEY, category varchar(10) NOT NULL);
CREATE TABLE post_category(
post_id INTEGER NOT NULL /* REFERENCES posts(id) */,
category_id INTEGER NOT NULL REFERENCES categories(id),
PRIMARY KEY(post_id, category_id)) ;
SELECT
q1.user_id, q1.user_name, q1.score, q1.reputation,
substring_index(group_concat(q2.tag ORDER BY q2.tag_reputation DESC SEPARATOR ','), ',', 2) AS top_two_tags,
substring_index(group_concat(q3.category ORDER BY q3.category_reputation DESC SEPARATOR ','), ',', 2) AS category
FROM
(SELECT
u.id AS user_Id,
u.user_name,
coalesce(sum(r.score), 0) as score,
coalesce(sum(r.reputation), 0) as reputation
FROM
users u
LEFT JOIN reputations r
ON r.user_id = u.id
AND r.date_time > 1500584821 /* unix_timestamp(DATE_SUB(now(), INTERVAL 1 WEEK)) */
GROUP BY
u.id, u.user_name
) AS q1
LEFT JOIN
(
SELECT
r.user_id AS user_id, t.tag, sum(r.reputation) AS tag_reputation
FROM
reputations r
JOIN post_tag pt ON pt.post_id = r.post_id
JOIN tags t ON t.id = pt.tag_id
WHERE
r.date_time > 1500584821 /* unix_timestamp(DATE_SUB(now(), INTERVAL 1 WEEK)) */
GROUP BY
user_id, t.tag
) AS q2
ON q2.user_id = q1.user_id
LEFT JOIN
(
SELECT
r.user_id AS user_id, c.category, sum(r.reputation) AS category_reputation
FROM
reputations r
JOIN post_category ct ON ct.post_id = r.post_id
JOIN categories c ON c.id = ct.category_id
WHERE
r.date_time > 1500584821 /* unix_timestamp(DATE_SUB(now(), INTERVAL 1 WEEK)) */
GROUP BY
user_id, c.category
) AS q3
ON q3.user_id = q1.user_id
GROUP BY
q1.user_id, q1.user_name, q1.score, q1.reputation
ORDER BY
q1.reputation DESC, q1.score DESC ;
Your second query is of the form:
q1 -- PK user_id
LEFT JOIN (...
GROUP BY user_id, t.tag
) AS q2
ON q2.user_id = q1.user_id
LEFT JOIN (...
GROUP BY user_id, c.category
) AS q3
ON q3.user_id = q1.user_id
GROUP BY -- group_concats
The inner GROUP BYs result in (user_id, t.tag)
& (user_id, c.category)
being keys/UNIQUEs. Other than that I won't address those GROUP BYs.
TL;DR When you join (q1 JOIN q2) to q3 it is not on a key/UNIQUE of one of them so for each user_id you get a row for every possible combination of tag & category. So the final GROUP BY inputs duplicates per (user_id, tag) & per (user_id, category) and inappropriately GROUP_CONCATs duplicate tags & categories per user_id. Correct would be (q1 JOIN q2 GROUP BY) JOIN (q1 JOIN q3 GROUP BY) in which all joins are on common key/UNIQUE (user_id)
& there is no spurious aggregation. Although sometimes you can undo such spurious aggregation.
A correct symmetrical INNER JOIN approach: LEFT JOIN q1 & q2--1:many--then GROUP BY & GROUP_CONCAT (which is what your first query did); then separately similarly LEFT JOIN q1 & q3--1:many--then GROUP BY & GROUP_CONCAT; then INNER JOIN the two results ON user_id--1:1.
A correct symmetrical scalar subquery approach: SELECT the GROUP_CONCATs from q1 as scalar subqueries each with a GROUP BY.
A correct cumulative LEFT JOIN approach: LEFT JOIN q1 & q2--1:many--then GROUP BY & GROUP_CONCAT; then LEFT JOIN that & q3--1:many--then GROUP BY & GROUP_CONCAT.
A correct approach like your 2nd query: You first LEFT JOIN q1 & q2--1:many. Then you LEFT JOIN that & q3--many:1:many. It gives a row for every possible combination of a tag & a category that appear with a user_id. Then after you GROUP BY you GROUP_CONCAT--over duplicate (user_id, tag) pairs and duplicate (user_id, category) pairs. That is why you have duplicate list elements. But adding DISTINCT to GROUP_CONCAT gives a correct result. (Per wchiquito's comment.)
Which you prefer is as usual an engineering tradeoff to be informed by query plans & timings, per actual data/usage/statistics. input & stats for expected amount of duplication), timing of actual queries, etc. One issue is whether the extra rows of the many:1:many JOIN approach offset its saving of a GROUP BY.
-- cumulative LEFT JOIN approach
SELECT
q1.user_id, q1.user_name, q1.score, q1.reputation,
top_two_tags,
substring_index(group_concat(q3.category ORDER BY q3.category_reputation DESC SEPARATOR ','), ',', 2) AS category
FROM
-- your 1st query (less ORDER BY) AS q1
(SELECT
q1.user_id, q1.user_name, q1.score, q1.reputation,
substring_index(group_concat(q2.tag ORDER BY q2.tag_reputation DESC SEPARATOR ','), ',', 2) AS top_two_tags
FROM
(SELECT
u.id AS user_Id,
u.user_name,
coalesce(sum(r.score), 0) as score,
coalesce(sum(r.reputation), 0) as reputation
FROM
users u
LEFT JOIN reputations r
ON r.user_id = u.id
AND r.date_time > 1500584821 /* unix_timestamp(DATE_SUB(now(), INTERVAL 1 WEEK)) */
GROUP BY
u.id, u.user_name
) AS q1
LEFT JOIN
(
SELECT
r.user_id AS user_id, t.tag, sum(r.reputation) AS tag_reputation
FROM
reputations r
JOIN post_tag pt ON pt.post_id = r.post_id
JOIN tags t ON t.id = pt.tag_id
WHERE
r.date_time > 1500584821 /* unix_timestamp(DATE_SUB(now(), INTERVAL 1 WEEK)) */
GROUP BY
user_id, t.tag
) AS q2
ON q2.user_id = q1.user_id
GROUP BY
q1.user_id, q1.user_name, q1.score, q1.reputation
) AS q1
-- finish like your 2nd query
LEFT JOIN
(
SELECT
r.user_id AS user_id, c.category, sum(r.reputation) AS category_reputation
FROM
reputations r
JOIN post_category ct ON ct.post_id = r.post_id
JOIN categories c ON c.id = ct.category_id
WHERE
r.date_time > 1500584821 /* unix_timestamp(DATE_SUB(now(), INTERVAL 1 WEEK)) */
GROUP BY
user_id, c.category
) AS q3
ON q3.user_id = q1.user_id
GROUP BY
q1.user_id, q1.user_name, q1.score, q1.reputation
ORDER BY
q1.reputation DESC, q1.score DESC ;
這篇關于來自 GROUP_BY 的兩個 LEFT JOIN 的 GROUP_CONCAT 的奇怪重復行為的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網!