問題描述
組織在多個(gè)部門和應(yīng)用程序之間共享關(guān)鍵數(shù)據(jù)的好方法是什么?
What are some good ways for an organization to share key data across many deparments and applications?
舉個(gè)例子,假設(shè)有一個(gè)主要的應(yīng)用程序和數(shù)據(jù)庫來管理客戶數(shù)據(jù).組織中還有十個(gè)其他應(yīng)用程序和數(shù)據(jù)庫讀取該數(shù)據(jù)并將其與自己的數(shù)據(jù)相關(guān)聯(lián).目前,這種數(shù)據(jù)共享是通過混合數(shù)據(jù)庫 (DB) 鏈接、物化視圖、觸發(fā)器、臨時(shí)表、重新鍵入信息、Web 服務(wù)等來完成的.
To give an example, let's say there is one primary application and database to manage customer data. There are ten other applications and databases in the organization that read that data and relate it to their own data. Currently this data sharing is done through a mixture of database (DB) links, materialized views, triggers, staging tables, re-keying information, web services, etc.
還有其他好的方法來共享數(shù)據(jù)嗎?并且,您的方法與上述方法相比,在以下問題方面如何:
Are there any other good approaches for sharing data? And, how do your approaches compare to the ones above with respect to concerns like:
請記住,共享客戶數(shù)據(jù)的使用方式多種多樣,從簡單的單記錄查詢到復(fù)雜的多謂詞、多排序、與存儲(chǔ)在不同數(shù)據(jù)庫中的其他組織數(shù)據(jù)的連接.
Keep in mind that the shared customer data is used in many ways, from simple, single record queries to complex, multi-predicate, multi-sort, joins with other organization data stored in different databases.
感謝您的建議和建議...
Thanks for your suggestions and advice...
推薦答案
我相信你已經(jīng)看到了,這取決于".
I'm sure you saw this coming, "It Depends".
這取決于一切.而A部門共享Customer數(shù)據(jù)的解決方案可能與B部門共享Customer數(shù)據(jù)完全不同.
It depends on everything. And the solution to sharing Customer data for department A may be completely different for sharing Customer data with department B.
多年來我最喜歡的概念是最終一致性"的概念.該術(shù)語來自亞馬遜談?wù)摲植际较到y(tǒng).
My favorite concept that has risen up over the years is the concept of "Eventual Consistency". The term came from Amazon talking about distributed systems.
前提是,雖然分布式企業(yè)中的數(shù)據(jù)狀態(tài)現(xiàn)在可能不完全一致,但最終"會(huì)如此.
The premise is that while the state of data across a distributed enterprise may not be perfectly consistent now, it "eventually" will be.
例如,當(dāng)客戶記錄在系統(tǒng) A 上更新時(shí),系統(tǒng) B 的客戶數(shù)據(jù)現(xiàn)在已過時(shí)且不匹配.但是,最終",來自 A 的記錄將通過某個(gè)過程發(fā)送到 B.因此,最終,這兩個(gè)實(shí)例將匹配.
For example, when a customer record gets updated on system A, system B's customer data is now stale and not matching. But, "eventually", the record from A will be sent to B through some process. So, eventually, the two instances will match.
當(dāng)您使用單個(gè)系統(tǒng)時(shí),您沒有EC",而是擁有即時(shí)更新、單一事實(shí)來源"以及通常用于處理競爭條件和沖突的鎖定機(jī)制.
When you work with a single system, you don't have "EC", rather you have instant updates, a single "source of truth", and, typically, a locking mechanism to handle race conditions and conflicts.
您的操作處理EC"數(shù)據(jù)的能力越強(qiáng),分離這些系統(tǒng)就越容易.一個(gè)簡單的例子是銷售使用的數(shù)據(jù)倉庫.他們使用 DW 來運(yùn)行他們的每日報(bào)告,但他們直到凌晨才運(yùn)行他們的報(bào)告,而且他們總是查看昨天"(或更早)的數(shù)據(jù).因此,DW 無需實(shí)時(shí)與日常運(yùn)營系統(tǒng)完全一致.一個(gè)流程在營業(yè)結(jié)束時(shí)運(yùn)行并在大型單一更新操作中將交易和活動(dòng)一起移動(dòng)幾天,這是完全可以接受的.
The more able your operations are able to work with "EC" data, the easier it is to separate these systems. A simple example is a Data Warehouse used by sales. They use the DW to run their daily reports, but they don't run their reports until the early morning, and they always look at "yesterdays" (or earlier) data. So there's no real time need for the DW to be perfectly consistent with the daily operations system. It's perfectly acceptable for a process to run at, say, close of business and move over the days transactions and activities en masse in a large, single update operation.
你可以看到這個(gè)需求是如何解決很多問題的.沒有事務(wù)數(shù)據(jù)的爭用,不用擔(dān)心某些報(bào)告數(shù)據(jù)會(huì)在累積統(tǒng)計(jì)數(shù)據(jù)的過程中發(fā)生變化,因?yàn)閳?bào)告對實(shí)時(shí)數(shù)據(jù)庫進(jìn)行了兩次單獨(dú)的查詢.白天無需為高細(xì)節(jié)的喋喋不休吸納網(wǎng)絡(luò)和cpu處理等.
You can see how this requirement can solve a lot of issues. There's no contention for the transactional data, no worries that some reports data is going to change in the middle of accumulating the statistic because the report made two separate queries to the live database. No need to for the high detail chatter to suck up network and cpu processing, etc. during the day.
現(xiàn)在,這是 EC 的一個(gè)極端、簡化且非常粗略的示例.
Now, that's an extreme, simplified, and very coarse example of EC.
但是考慮像 Google 這樣的大型系統(tǒng).作為搜索的消費(fèi)者,我們不知道谷歌在搜索頁面上獲得的搜索結(jié)果何時(shí)或需要多長時(shí)間.1毫秒?1秒?10s?10小時(shí)?很容易想象,如果您訪問 Google 的西海岸服務(wù)器,您很可能會(huì)得到與訪問他們的東海岸服務(wù)器不同的搜索結(jié)果.這兩個(gè)實(shí)例在任何時(shí)候都不是完全一致的.但在很大程度上,它們大多是一致的.對于他們的用例,他們的消費(fèi)者并沒有真正受到滯后和延遲的影響.
But consider a large system like Google. As a consumer of Search, we have no idea when or how long it takes for a search result that Google harvests to how up on a search page. 1ms? 1s? 10s? 10hrs? It's easy to imaging how if you're hitting Googles West Coast servers, you may very well get a different search result than if you hit their East Coast servers. At no point are these two instances completely consistent. But by large measure, they are mostly consistent. And for their use case, their consumers aren't really affected by the lag and delay.
考慮電子郵件.A 想向 B 發(fā)送消息,但在此過程中,消息通過系統(tǒng) C、D 和 E 進(jìn)行路由.每個(gè)系統(tǒng)都接受消息,對其承擔(dān)全部責(zé)任,然后將其交給另一個(gè)系統(tǒng).發(fā)件人看到電子郵件繼續(xù)發(fā)送.接收者不會(huì)真的錯(cuò)過它,因?yàn)樗麄儾灰欢ㄖ浪牡絹?因此,該消息在系統(tǒng)中移動(dòng)可能需要很長的時(shí)間窗口,而無需任何人知道或關(guān)心它的速度.
Consider email. A wants to send message to B, but in the process the message is routed through system C, D, and E. Each system accepts the message, assume complete responsibility for it, and then hands it off to another. The sender sees the email go on its way. The receiver doesn't really miss it because they don't necessarily know its coming. So, there is a big window of time that it can take for that message to move through the system without anyone concerned knowing or caring about how fast it is.
另一方面,A 可以和 B 通電話.我剛剛發(fā)送了,你收到了嗎?現(xiàn)在?現(xiàn)在?現(xiàn)在?現(xiàn)在收到?"
On the other hand, A could have been on the phone with B. "I just sent it, did you get it yet? Now? Now? Get it now?"
因此,存在某種潛在的、隱含的性能和響應(yīng)水平.最后,最終",A 的發(fā)件箱與 B 的收件箱匹配.
Thus, there is some kind of underlying, implied level of performance and response. In the end, "eventually", A's outbox matches B inbox.
這些延遲、對陳舊數(shù)據(jù)的接受,無論是一天前還是 1-5 秒前,都控制著您系統(tǒng)的最終耦合.此要求越寬松,耦合就越寬松,您在設(shè)計(jì)方面的靈活性就越大.
These delays, the acceptance of stale data, whether its a day old or 1-5s old, are what control the ultimate coupling of your systems. The looser this requirement, the looser the coupling, and the more flexibility you have at your disposal in terms of design.
這適用于 CPU 中的內(nèi)核.運(yùn)行在同一系統(tǒng)上的現(xiàn)代、多核、多線程應(yīng)用程序可以對相同"數(shù)據(jù)有不同的看法,只有微秒過時(shí).如果您的代碼可以在數(shù)據(jù)可能彼此不一致的情況下正常工作,那么快樂的一天,它會(huì)繼續(xù)前進(jìn).如果不是,您需要特別注意確保您的數(shù)據(jù)完全一致,使用易失性內(nèi)存限定或鎖定構(gòu)造等技術(shù).所有這些,都以他們的方式,性價(jià)比.
This is true down to the cores in your CPU. Modern, multi core, multi-threaded applications running on the same system, can have different views of the "same" data, only microseconds out of date. If your code can work correctly with data potentially inconsistent with each other, then happy day, it zips along. If not you need to pay special attention to ensure your data is completely consistent, using techniques like volatile memory qualifies, or locking constructs, etc. All of which, in their way, cost performance.
所以,這是基本考慮因素.所有其他決定都從這里開始.回答這個(gè)問題可以告訴您如何跨機(jī)器對應(yīng)用程序進(jìn)行分區(qū)、共享哪些資源以及如何共享它們.哪些協(xié)議和技術(shù)可用于移動(dòng)數(shù)據(jù),以及執(zhí)行傳輸?shù)奶幚沓杀?復(fù)制、負(fù)載均衡、數(shù)據(jù)共享等等,都是基于這個(gè)概念.
So, this is the base consideration. All of the other decisions start here. Answering this can tell you how to partition applications across machines, what resources are shared, and how they are shared. What protocols and techniques are available to move the data, and how much it will cost in terms of processing to perform the transfer. Replication, load balancing, data shares, etc. etc. All based on this concept.
編輯,回應(yīng)第一條評(píng)論.
Edit, in response to first comment.
正確,完全正確.這里的游戲,例如,如果 B 不能更改客戶數(shù)據(jù),那么更改客戶數(shù)據(jù)有什么危害?您可以冒險(xiǎn)"讓它在短時(shí)間內(nèi)過時(shí)嗎?也許您的客戶數(shù)據(jù)進(jìn)入的速度足夠慢,您可以立即將其從 A 復(fù)制到 B.假設(shè)更改被放在一個(gè)隊(duì)列中,由于音量低,很容易被取走(<1s),但即使如此,原始更改仍將超出事務(wù)",因此有一個(gè)小窗口,A 將有 B 沒有的數(shù)據(jù).
Correct, exactly. The game here, for example, if B can't change customer data, then what is the harm with changed customer data? Can you "risk" it being out of date for a short time? Perhaps your customer data comes in slowly enough that you can replicate it from A to B immediately. Say the change is put on a queue that, because of low volume, gets picked up readily (< 1s), but even still it would be "out of transaction" with the original change, and so there's a small window where A would have data that B does not.
現(xiàn)在大腦真的開始旋轉(zhuǎn)了.在那段滯后"期間會(huì)發(fā)生什么,最糟糕的情況是什么.你能圍繞它進(jìn)行設(shè)計(jì)嗎?如果您可以設(shè)計(jì)大約 1 秒的延遲,那么您可能能夠設(shè)計(jì)大約 5 秒、1 米甚至更長的延遲.您在 B 上實(shí)際使用了多少客戶數(shù)據(jù)?也許 B 是一個(gè)旨在促進(jìn)從庫存中揀貨的系統(tǒng).很難想象有什么比簡單的客戶 ID 和姓名更必要的了.只是在組裝時(shí)粗略地確定訂單是誰的東西.
Now the mind really starts spinning. What happens during that 1s of "lag", whats the worst possible scenario. And can you engineer around it? If you can engineer around a 1s lag, you may be able to engineer around a 5s, 1m, or even longer lag. How much of the customer data do you actually use on B? Maybe B is a system designed to facilitate order picking from inventory. Hard to imagine anything more being necessary than simply a Customer ID and perhaps a name. Just something to grossly identify who the order is for while it's being assembled.
揀貨系統(tǒng)不一定需要在揀貨過程結(jié)束前打印出所有客戶信息,屆時(shí)訂單可能已轉(zhuǎn)移到另一個(gè)可能更新的系統(tǒng),尤其是運(yùn)輸信息,因此最終揀選系統(tǒng)根本不需要任何客戶數(shù)據(jù).事實(shí)上,您可以在揀配訂單中嵌入和非規(guī)范化客戶信息,因此無需或期望稍后進(jìn)行同步.只要客戶 ID 是正確的(無論如何都不會(huì)更改)和名稱(更改很少,因此不值得討論),這是您唯一需要的真實(shí)參考,并且您的所有提貨單在當(dāng)時(shí)都是完全準(zhǔn)確的創(chuàng)作.
The picking system doesn't necessarily need to print out all of the customer information until the very end of the picking process, and by then the order may have moved on to another system that perhaps is more current with, especially, shipping information, so in the end the picking system doesn't need hardly any customer data at all. In fact, you could EMBED and denormalize the customer information within the picking order, so there's no need or expectation of synchronizing later. As long as the Customer ID is correct (which will never change anyway) and the name (which changes so rarely it's not worth discussing), that's the only real reference you need, and all of your pick slips are perfectly accurate at the time of creation.
關(guān)鍵在于思維方式,即分解系統(tǒng)并專注于任務(wù)所需的基本數(shù)據(jù).您不需要的數(shù)據(jù)不需要復(fù)制或同步.人們對非規(guī)范化和數(shù)據(jù)縮減等事情感到惱火,尤其是當(dāng)他們來自關(guān)系數(shù)據(jù)建模世界時(shí).有充分的理由,應(yīng)該謹(jǐn)慎考慮.但是一旦你去分布式,你就隱式地非規(guī)范化了.哎呀,你現(xiàn)在正在批量復(fù)制它.所以,你最好更聰明一點(diǎn).
The trick is the mindset, of breaking the systems up and focusing on the essential data that's necessary for the task. Data you don't need doesn't need to be replicated or synchronized. Folks chafe at things like denormalization and data reduction, especially when they're from the relational data modeling world. And with good reason, it should be considered with caution. But once you go distributed, you have implicitly denormalized. Heck, you're copying it wholesale now. So, you may as well be smarter about it.
所有這些都可以通過可靠的程序和對工作流程的透徹理解來緩解.識(shí)別風(fēng)險(xiǎn)并制定政策和程序來處理它們.
All this can mitigated through solid procedures and thorough understanding of workflow. Identify the risks and work up policy and procedures to handle them.
但困難的部分是一開始就打破中央數(shù)據(jù)庫的鏈條,并告訴人們他們不能像他們期望的那樣擁有一切",當(dāng)您擁有一個(gè)單一的、中央的、完美的信息存儲(chǔ)時(shí).
But the hard part is breaking the chain to the central DB at the beginning, and instructing folks that they can't "have it all" like they may expect when you have a single, central, perfect store of information.
這篇關(guān)于如何在組織內(nèi)共享數(shù)據(jù)的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網(wǎng)!