問題描述
我有一個數據集如下:
Country,Commodity,Year,Type,Amount
US,Vegetable,2010,Harvested,2.44
US,Vegetable,2010,Yield,15.8
US,Vegetable,2010,Production,6.48
US,Vegetable,2011,Harvested,6
US,Vegetable,2011,Yield,18
US,Vegetable,2011,Production,3
Argentina,Vegetable,2010,Harvested,15.2
Argentina,Vegetable,2010,Yield,40.5
Argentina,Vegetable,2010,Production,2.66
Argentina,Vegetable,2011,Harvested,15.2
Argentina,Vegetable,2011,Yield,40.5
Argentina,Vegetable,2011,Production,2.66
Bhutan,Vegetable,2010,Harvested,7
Bhutan,Vegetable,2010,Yield,35
Bhutan,Vegetable,2010,Production,5
Bhutan,Vegetable,2011,Harvested,2
Bhutan,Vegetable,2011,Yield,6
Bhutan,Vegetable,2011,Production,3
鑒于:
- 如果任何一個國家在數據中有 n 年,所有其他國家也應有相同的 n 年.例如:如果美國有 2011 年和 2012 年的數據,那么所有其他國家/地區都將有 2011 年和 2012 年的數據.
條件:
- 聚合僅在多國選擇時發生.將按商品和年份分組.
例如:如果前端工具中的用戶選擇美國和阿根廷,我們必須顯示 -
Eg: If a user in the frontend tool selects US and Argentina, we have to show -
衍生產量的數量 =(美國收獲量 + 阿根廷收獲量)/(美國產量 + 阿根廷產量),即 (2.44+15.2)/(6.48+2.66),同樣,對于三個國家,它將是三個收獲值的相加除以三個產值的相加,依此類推.必須將其填充到新行中.
The Amount for Derived Yield = (Harvested of US + Harvested of Argentina)/(Production of US + Production of Argentina), i.e., (2.44+15.2)/(6.48+2.66), similarly for three countries it will be addition of three harvested value divided by addition of three production value and so on. That has to be populated in a new row.
注意:前端用戶可以選擇任意國家/地區組合.在后端執行而不是在前端動態執行的唯一目的是因為 AWS QuickSight(我們的可視化工具)雖然可以在選定的列過濾器上填充總和,但尚不支持對那些衍生的求和字段進行計算.因此,必須預先填充所有國家/地區組合的整個計算(非常幼稚的方法),以便在報告中提供.
Note: The users in the frontend can select any combination of countries. The sole purpose of doing it in the backend rather than dynamically doing it in the frontend is because AWS QuickSight (our visualisation tool), even though can populate sum on selected column filters but doesn't yet support calculation on those derived summed fields. Hence, the entire calculation of all combination of countries has to be pre-populated (very naive approach) in order to make it available in report.
我向所有 SQL 專家提出的兩個問題是:
Two of my question to all SQL experts is:
- 如何填充按年份和商品分組的所有國家/地區組合的行,以便包含所有可能組合的數據.
- 鑒于我可以填充所有行組合,報告工具將如何根據用戶選擇的國家/地區了解選擇哪個派生行,因為該行標記為美國 + 阿根廷,該行為美國 + 不丹,等
非常歡迎任何解決方案.
Any solution is extremely welcome.
首選 SQL 工具:Spark SQL 或 Athena SQL(在 Presto 上運行)或 HiveQL.次選:Oracle、PGSQL
SQL Tool preferred: Spark SQL or Athena SQL (runs on Presto) or HiveQL. Less preferred: Oracle, PGSQL
注意 2:發布這個問題的唯一目的,即使我已經在另一個問題中詳細闡述過,也是因為我不想將我的天真方法強加給試圖解決問題的人問題,所以在這里,與尋求解決方案的幫助相比,我更清楚地定義了問題.而在另一個問題中,我已經給出了預期結果的方法.如果您想查看其他問題,這里是.
Note 2: The sole purpose of posting this question, even though I've elaborated the same in another one is because I don't want to impose my naive approach on somebody trying to solve the problem, so here, I've defined the problem with more clarity than asking for help in solution. Whereas, in the other question I have given my approach for the expected result. In case if you want to see the other question, here it is.
推薦答案
你可以從這樣的事情開始:
you can start with something like this:
select * from
(
select c.Country, y.Year
from
(select distinct Country from table) as c,
(select distinct Year from table) as y
) as cy
left join table as t on t.Country = cy.Country and t.Year = cy.Year
這將為您提供包含國家/年所有組合的所有行以及主表中的可選數據,因此您現在可以添加過濾器/分組
this will give you all rows with all combinations of Country/Year and optionally data from main table, so you can now add filter/grouping
這篇關于需要指導:后端 SQL 邏輯,用于前端用戶動態選擇字段的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網!