問題描述
我在這里設置了一個小提琴:https://www.db-fiddle.com/f/snDGExYZgoYASvWkDGHKDC/2
I've setup a fiddle here: https://www.db-fiddle.com/f/snDGExYZgoYASvWkDGHKDC/2
還有:
架構:
CREATE TABLE `scores` (
`id` bigint unsigned NOT NULL AUTO_INCREMENT,
`shift_id` int unsigned NOT NULL,
`employee_name` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
`score` double(8,2) unsigned NOT NULL,
`created_at` timestamp NOT NULL,
PRIMARY KEY (`id`)
);
INSERT INTO scores(shift_id, employee_name, score, created_at) VALUES
(1, "John", 6.72, "2020-04-01 00:00:00"),
(1, "Bob", 15.71, "2020-04-01 00:00:00"),
(1, "Bob", 54.02, "2020-04-01 08:00:00"),
(1, "John", 23.55, "2020-04-01 13:00:00"),
(2, "John", 9.13, "2020-04-02 00:00:00"),
(2, "Bob", 44.76, "2020-04-02 00:00:00"),
(2, "Bob", 33.40, "2020-04-02 08:00:00"),
(2, "James", 20, "2020-04-02 00:00:00"),
(3, "John", 20, "2020-04-02 00:00:00"),
(3, "Bob", 20, "2020-04-02 00:00:00"),
(3, "Bob", 30, "2020-04-02 08:00:00"),
(3, "James", 10, "2020-04-02 00:00:00")
查詢 1:
-- This doesn't work
SELECT
employee_name,
DATE_FORMAT(created_at, '%Y-%m-%d') AS `date`,
ANY_VALUE(AVG(score) OVER(PARTITION BY(ANY_VALUE(created_at)))) AS `average_score`
FROM
scores
GROUP BY
employee_name, date;
查詢 2:
SELECT
employee_name,
DATE_FORMAT(created_at, '%Y-%m-%d') AS `date`,
ANY_VALUE(AVG(score)) AS `average_score`
FROM
scores
GROUP BY
employee_name, date;
查詢 3:
-- This works but scales very poorly with millions of rows
SELECT
t1.employee_name,
ANY_VALUE(DATE_FORMAT(t1.created_at, '%Y-%m-%d')) AS `date`,
ANY_VALUE(SUM(t1.score) / (
SELECT SUM(t2.score)
FROM scores t2
WHERE date(t2.created_at) = date(t1.created_at)
) * 100) AS `average_score`
FROM
scores t1
GROUP BY
t1.employee_name, date;
第三個查詢正確執行,但在我的測試中,當擴展到數百萬行時非常慢.我認為這是因為它是一個相關的子查詢并且運行了數百萬次.
The third query executes correctly but in my testing has been very slow when scaling to millions of rows. I think this is because it is a correlated subquery and runs millions of times.
前兩次嘗試是我嘗試創建以使用 MySQL 8 Window Functions 對平均計算進行分區.然而,這些正在產生意想不到的結果.給定日期的 average_score
總數應該加起來為 100,就像在第三個查詢中一樣.
The first two attempts are me trying to created to use MySQL 8 Window Functions to partition the average calculation. However, these are giving unexpected results. The total average_score
s for a given day should add up to 100, like it does in the 3rd query.
有人知道更有效的計算方法嗎?
Does anyone know of a more efficient way to calculate this?
還值得注意的是,在現實中,查詢中也會有一個 WHERE IN
以按特定的 shift_id
進行過濾.給定的 shift_ids
數量可以是幾十萬,也可以是一百萬.
It's also worth noting that in reality, there will also be a WHERE IN
on the queries to filter by specific shift_id
s. The number of shift_ids
given could be in the hundreds of thousands, up to a million.
正在考慮的另一件事是 ElasticSearch.是否有助于更快地計算這些?
One other thing being considered is ElasticSearch. Would it help with calculating these in a quicker way?
推薦答案
您可以使用窗口函數.訣竅是取每個員工每天總分的窗口總和,如下所示:
You can use window functions. The trick is to take a window sum of the total score per employee for each day, like so:
select
employee_name,
date(created_at) created_date,
100 * sum(score) / sum(sum(score)) over(partition by date(created_at)) monthly_score
from scores
group by employee_name, date(created_at)
在你的數據庫小提琴中,這個產量:
In your DB Fiddle, this yields:
| employee_name | created_date | monthly_score |
| ------------- | ------------ | ------------- |
| John | 2020-04-01 | 30.27 |
| Bob | 2020-04-01 | 69.73 |
| John | 2020-04-02 | 15.55342 |
| Bob | 2020-04-02 | 68.42864 |
| James | 2020-04-02 | 16.01794 |
這篇關于MySQL 8 按日期分區計算平均值的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網!