問題描述
我將創建一個包含大量相似項(數百萬個)的應用程序,并且我想將它們存儲在 MySQL 數據庫中,因為我想進行大量統計并搜索特定列的特定值.
I will make an application with a lot of similar items (millions), and I would like to store them in a MySQL database, because I would like to do a lot of statistics and search on specific values for specific columns.
但同時,我將存儲所有項之間的關系,這些關系在許多連接的二叉樹狀結構(傳遞閉包)中相關,而關系數據庫不擅長這種結構,所以我會喜歡將所有關系存儲在 Neo4j 中,對此類數據具有良好的性能.
But at the same time, I will store relations between all the items, that are related in many connected binary-tree-like structures (transitive closure), and relation databases are not good at that kind of structures, so I would like to store all relations in Neo4j which have good performance for this kind of data.
我的計劃是將除了 MySQL 數據庫中的關系和所有與 item_id
的關系存儲在 Neo4j 數據庫中的所有數據.當我想查找一棵樹時,我首先在 Neo4j 中搜索樹中的所有 item_id
:s,然后在 MySQL 數據庫中搜索查詢中的所有指定項目,如下所示:
My plan is to have all data except the relations in the MySQL database and all relations with item_id
stored in the Neo4j database. When I want to lookup a tree, I first search the Neo4j for all the item_id
:s in the tree, then I search the MySQL-database for all the specified items in a query that would look like:
SELECT * FROM items WHERE item_id = 45 OR item_id = 345435 OR item_id = 343 OR item_id = 78 OR item_id = 4522 OR item_id = 676 OR item_id = 443 OR item_id = 4255 OR item_id= 4345
SELECT * FROM items WHERE item_id = 45 OR item_id = 345435 OR item_id = 343 OR item_id = 78 OR item_id = 4522 OR item_id = 676 OR item_id = 443 OR item_id = 4255 OR item_id = 4345
這是個好主意,還是我錯了?我以前沒有使用過圖形數據庫.有沒有更好的方法來解決我的問題?在這種情況下,MySQL 查詢將如何執行?
Is this a good idea, or am I very wrong? I haven't used graph-databases before. Are there any better approaches to my problem? How would the MySQL-query perform in this case?
推薦答案
對此的一些想法:
我會嘗試對您的 Neo4j 域模型進行建模,以在圖中包含每個節點的屬性.通過將您的數據分成兩個不同的數據存儲,您可能會限制您可能想要執行的某些操作.
I would try modelling your Neo4j domain model to include the attributes of each node in the graph. By separating your data into two different data stores you might limit some operations that you might want to do.
我想這歸結為您將如何處理您的圖表.例如,如果您想查找所有連接到其屬性(即名稱、年齡等)為特定值的特定節點的所有節點,您是否首先必須在 MySQL 數據庫中找到正確的節點 ID,然后進入Neo4j?當您可以在 Neo4j 中完成所有這些時,這看起來很慢且過于復雜.所以問題是:遍歷圖時是否需要節點的屬性?
I guess it comes down to what you will be doing with your graph. If, for example, you want to find all the nodes connected to a specific node whose attributes (ie name, age.. whatever) are certain values, would you first have to find the correct node ID in your MySQL database and then go into Neo4j? This just seems slow and overly complicated when you could do all this in Neo4j. So the question is: will you need the attributes of a node when traversing the graph?
您的數據會改變還是靜態的?擁有兩個獨立的數據存儲會使事情復雜化.
Will your data change or is it static? By having two separate data stores it will complicate matters.
雖然使用 MySQL 數據庫生成統計數據可能比在 Neo4j 中做任何事情都容易,但遍歷圖形以查找滿足定義條件的所有節點所需的代碼并不太困難.這些統計數據應該推動您的解決方案.
Whilst generating statistics using a MySQL database might be easier than doing everything in Neo4j, the code required to traverse a graph to find all the nodes that meet a defined criteria isn't overly difficult. What these stats are should drive your solution.
我無法評論用于選擇節點 ID 的 MySQL 查詢的性能.我想這取決于您需要選擇多少個節點以及您的索引策略.不過,我同意遍歷圖形時的性能方面.
I can't comment on the performance of the MySQL query to select node ids. I guess that comes down to how many nodes you will need to select and your indexing strategy. I agree about the performance side of things when it comes to traversing a graph though.
這是一篇很好的文章:MySQL vs. Neo4j在大規模圖遍歷中,在這種情況下,當他們說大時,它們僅意味著一百萬個頂點/節點和四百萬條邊.所以它甚至不是一個特別密集的圖.
This is a good article on just this: MySQL vs. Neo4j on a Large-Scale Graph Traversal and in this case, when they say large, they only mean a million vertices/nodes and four million edges. So it wasn't even a particularly dense graph.
這篇關于MySQL 和 Neo4j 一起使用是個好主意嗎?的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網!