問題描述
我正在開發一個介于電子郵件服務和社交網絡之間的網絡應用.我覺得它有潛力在未來變得非常大,所以我擔心可擴展性.
I'm working on a web app that is somewhere between an email service and a social network. I feel it has the potential to grow really big in the future, so I'm concerned about scalability.
我決定為每個活動用戶創建一個單獨的 SQLite 數據庫,而不是使用一個集中的 MySQL/InnoDB 數據庫,然后在那個時候對其進行分區:每個分片"一個活動用戶.
Instead of using one centralized MySQL/InnoDB database and then partitioning it when that time comes, I've decided to create a separate SQLite database for each active user: one active user per 'shard'.
這樣備份數據庫就像每天將每個用戶的小數據庫文件復制到遠程位置一樣簡單.
That way backing up the database would be as easy as copying each user's small database file to a remote location once a day.
擴展就像添加額外的硬盤來存儲新文件一樣簡單.
Scaling up will be as easy as adding extra hard disks to store the new files.
當應用程序超出單個服務器時,我可以使用 GlusterFS 在文件系統級別將服務器鏈接在一起并原樣運行應用程序,或者安裝一個簡單的 SQLite 代理系統,允許每個服務器操作相鄰服務器中的 sqlite 文件.
When the app grows beyond a single server I can link the servers together at the filesystem level using GlusterFS and run the app unchanged, or rig up a simple SQLite proxy system that will allow each server to manipulate sqlite files in adjacent servers.
并發問題將最小化,因為每個 HTTP 請求一次只會訪問一個或兩個數據庫文件,在數千個中,而且 SQLite 無論如何只會阻止讀取.
Concurrency issues will be minimal because each HTTP request will only touch one or two database files at a time, out of thousands, and SQLite only blocks on reads anyway.
我敢打賭,這種方法將使我的應用程序能夠優雅地擴展并支持許多很酷和獨特的功能.我賭錯了嗎?我錯過了什么嗎?
I'm betting that this approach will allow my app to scale gracefully and support lots of cool and unique features. Am I betting wrong? Am I missing anything?
更新 我決定采用一個不太極端的解決方案,到目前為止它運行良好.我正在使用固定數量的分片 - 準確地說是 256 個 sqlite 數據庫.每個用戶都通過一個簡單的哈希函數分配并綁定到一個隨機分片.
UPDATE I decided to go with a less extreme solution, which is working fine so far. I'm using a fixed number of shards - 256 sqlite databases, to be precise. Each user is assigned and bound to a random shard by a simple hash function.
我的應用程序的大多數功能每個請求只需要訪問一兩個分片,但有一個特別需要對 256 個不同分片中的 10 到 100 個不同分片執行簡單查詢,具體取決于用戶.測試表明,如果所有數據都緩存在 RAM 中,大約需要 0.02 秒或更短的時間.我想我可以忍受!
Most features of my app require access to just one or two shards per request, but there is one in particular that requires the execution of a simple query on 10 to 100 different shards out of 256, depending on the user. Tests indicate it would take about 0.02 seconds, or less, if all the data is cached in RAM. I think I can live with that!
UPDATE 2.0 我將應用程序移植到 MySQL/InnoDB 并且能夠獲得與常規請求大致相同的性能,但是對于需要分片行走的請求,innodb 的速度提高了 4-5 倍.出于這個原因和其他原因,我放棄了這個架構,但我希望有人能在某個地方找到它的用處......謝謝.
UPDATE 2.0 I ported the app to MySQL/InnoDB and was able to get about the same performance for regular requests, but for that one request that requires shard walking, innodb is 4-5 times faster. For this reason, and other reason, I'm dropping this architecture, but I hope someone somewhere finds a use for it...thanks.
推薦答案
如果您必須執行所謂的分片行走"——即找出一堆不同用戶的所有數據,就會失敗.這種特定類型的查詢"必須以編程方式完成,依次詢問每個 SQLite 數據庫 - 并且很可能是您站點中最慢的方面.在將數據分片"到不同數據庫的任何系統中,這是一個常見問題.
The place where this will fail is if you have to do what's called "shard walking" - which is finding out all the data across a bunch of different users. That particular kind of "query" will have to be done programmatically, asking each of the SQLite databases in turn - and will very likely be the slowest aspect of your site. It's a common issue in any system where data has been "sharded" into separate databases.
如果所有數據對用戶來說都是獨立的,那么這應該可以很好地擴展 - 使其成為有效設計的關鍵是了解數據可能會如何使用以及數據是否來自一個人將與來自另一個(在您的上下文中)的數據進行交互.
If all the of the data is self-contained to the user, then this should scale pretty well - the key to making this an effective design is to know how the data is likely going to be used and if data from one person will be interacting with data from another (in your context).
您可能還需要注意文件系統資源 - SQLite 很棒、很棒、速度很快等等 - 但是在使用標準數據庫"(即 MySQL、PostgreSQL 等)時確實可以獲得一些緩存和寫入優勢,因為它們的設計方式.在您提議的設計中,您會錯過其中的一些內容.
You may also need to watch out for file system resources - SQLite is great, awesome, fast, etc - but you do get some caching and writing benefits when using a "standard database" (i.e. MySQL, PostgreSQL, etc) because of how they're designed. In your proposed design, you'll be missing out on some of that.
這篇關于極端分片:每個用戶一個 SQLite 數據庫的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網!