問題描述
Pyspark DataFrameWriter
類有一個 jdbc
函數 用于將數據幀寫入 sql.這個函數有一個 --ignore
選項,文檔說:
The Pyspark DataFrameWriter
class has a jdbc
function for writing a dataframe to sql. This function has an --ignore
option that the documentation says will:
如果數據已經存在,則靜默忽略此操作.
Silently ignore this operation if data already exists.
但是它會忽略整個事務,還是只會忽略插入重復的行?如果我將 --ignore
與 --append
標志結合起來會怎樣?行為會改變嗎?
But will it ignore the entire transaction, or will it only ignore inserting the rows that are duplicates? What if I were to combine --ignore
with the --append
flag? Would the behavior change?
推薦答案
mode("ingore")
如果表(或另一個接收器)已經存在并且無法組合寫入模式,則只是 NOOP.如果您正在尋找諸如 INSERT IGNORE
或 INSERT INTO ... WHERE NOT EXISTS ...
之類的內容,則必須手動執行,例如使用 mapPartitions
.
mode("ingore")
is just NOOP if table (or another sink) already exists and writing modes cannot be combined. If you're looking for something like INSERT IGNORE
or INSERT INTO ... WHERE NOT EXISTS ...
you'll have to do it manually, for example with mapPartitions
.
這篇關于Pyspark DataFrameWriter jdbc 函數的 ignore 選項是忽略整個事務還是只是有問題的行?的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網!