欧美久久一区二区,欧美精品一区二区三区在线,91在线成人

本文介紹了使用 T-SQL 在時間序列數(shù)據(jù)中前向填充空值的有效方法的處理方法，對大家解決問題具有一定的參考價值，需要的朋友們下面隨著小編來一起學(xué)習(xí)吧！

問題描述

我有一個包含大部分為空值的時間序列數(shù)據(jù)的表，我想用最后一個已知值填充所有空值.

I have a table with time-series data that's mostly nulls, and I want to fill in all of the nulls with the last known value.

我有一些解決方案，但它們比在 Pandas 中執(zhí)行等效的 DataFrame.fillna(method='ffill') 操作要慢得多.

I have a few solutions, but they're much slower than doing the equivalent DataFrame.fillna(method='ffill') operation in Pandas.

我正在使用的代碼/數(shù)據(jù)的簡化版本:

A simplified version of the code / data that I'm using:

select d.[date], d.[price],
       (select top 1 p.price from price_table p
        where p.price is not null and p.[date] <= p.[date]
        order by p.[date] desc) as ff_price
from price_table d

制作桌子

date       price ff_price
---------- ----- --------
2016-07-11 0.79  0.79
2016-07-12 NULL  0.79
2016-07-13 NULL  0.79
2016-07-14 0.69  0.69
2016-07-15 NULL  0.69
...
2016-09-21 0.88  0.88
...

我有超過 1 億行，所以這需要很長時間.

I have >100 million rows, so this takes quite a while.

推薦答案

假設(shè)你的列是 DATE 并且價格是 DECIMAL(5,2)，請測試這個方法:

Assuming that your column is DATE and price is DECIMAL(5,2), please test this approach:

SELECT
    P.[date],
    P.[price],
    ff_price = CONVERT(
        DECIMAL(5,2),       -- Original price datatype
        SUBSTRING(
            MAX(
                CAST(P.[date] AS BINARY(3)) +   -- 3: datalength of P.[date] column
                CAST(P.[price] AS BINARY(5))    -- 5: datalength of P.[price] column
            ) OVER (ORDER BY P.[date] ROWS UNBOUNDED PRECEDING),

            4,  -- Position to start that's not the binary part of the date

            5))-- Characters that compose the binary of the original price datatype
FROM
    price_table  AS P

這是我用類似問題實現(xiàn)的解決方案，您可以找到詳盡的解釋此處.這種方法之所以好是因為它不需要顯式排序，只要您有日期的索引即可.

This is a solution I implemented with a similar problem and you can find the exaustive explanation here. The reason this approach is good is because it doesn't require a explicit sort, as long as you have an index by date.

它所做的基本上是使用窗口化的 MAX 與組成日期列的 3 個字節(jié)的串聯(lián)(這就是為什么我提到您的列必須是 DATE，否則 DATETIME 將需要 8 個字節(jié)，您可以編輯查詢以使用它)使用構(gòu)成您的價格列的字節(jié)(也假定為 5 個字節(jié)).這是 CAST(P.[date] AS BINARY(3)) + CAST(P.[price] AS BINARY(5)) 部分.

What it does is basically use a windowed MAX with the concatenation of the 3 bytes that composes your date column (this is why I mentioned that you column must be DATE, otherwise DATETIME will need 8 bytes, you can edit the query to work with this) with the bytes that compose your price column (which are 5 bytes, also assumed). This is the CAST(P.[date] AS BINARY(3)) + CAST(P.[price] AS BINARY(5)) part.

當(dāng)你計算這個和 ORDER BY P.[date] ROWS UNBOUNDED PRECEDING 時，引擎基本上是滾動最大值，其中最重要的字節(jié)是你的日期.當(dāng)日期更改時，最大值結(jié)果將始終更新，但考慮到將任何值與 NULL 作為價格連接也會產(chǎn)生 NULL(作為二進(jìn)制)，那么 MAX 將始終忽略此值并保留之前的非空 MAX(按 P.[date] ROWS UNBOUNDED PRECEDING).

When you calculate this and ORDER BY P.[date] ROWS UNBOUNDED PRECEDING, the engine is basically doing rolling max with values which most significant bytes are your dates. The max result will always update when the date changes, but considering that concatenating any value with NULL as price will also yield NULL (as binary), then the MAX will always ignore this value and retain the previous non-null MAX (by P.[date] ROWS UNBOUNDED PRECEDING).

這是窗口化 MAX 的二進(jìn)制結(jié)果(我添加了一個帶有 NULL 的前一條記錄，所以你看到結(jié)果是 NULL 表示 null價格值):

This is the binary result of the windowed MAX (I added a previous record with NULL so you see that result is NULL for null prices values):

date        price   ff_price    WindowedMax
2016-07-10  NULL    NULL        NULL
2016-07-11  0.79    0.79        0x9B3B0B050200014F
2016-07-12  NULL    0.79        0x9B3B0B050200014F
2016-07-13  NULL    0.79        0x9B3B0B050200014F
2016-07-14  0.69    0.69        0x9E3B0B0502000145
2016-07-15  NULL    0.69        0x9E3B0B0502000145
2016-07-21  0.88    0.88        0xA53B0B0502000158
2016-07-22  NULL    0.88        0xA53B0B0502000158

這篇關(guān)于使用 T-SQL 在時間序列數(shù)據(jù)中前向填充空值的有效方法的文章就介紹到這了，希望我們推薦的答案對大家有所幫助，也希望大家多多支持html5模板網(wǎng)！

【網(wǎng)站聲明】本站部分內(nèi)容來源于互聯(lián)網(wǎng),旨在幫助大家更快的解決問題，如果有圖片或者內(nèi)容侵犯了您的權(quán)益，請聯(lián)系我們刪除處理，感謝您的支持！

久久久久久久av_日韩在线中文_看一级毛片视频_日本精品二区_成人深夜福利视频_武道仙尊动漫在线观看

使用 T-SQL 在時間序列數(shù)據(jù)中前向填充空值的有效

問題描述

推薦答案

相關(guān)文檔推薦