問題描述
我在一個 mysql 表中有大約 100000
行,其中 每行大約有 8 個字段.
I have around 100thousand
rows in a mysql table, where each row has about 8 fields.
我終于掌握了如何使用 Zend Lucene
從 mysql 表中索引和搜索數據.
I have finally got the hold on how to use Zend Lucene
to index and search data from a mysql table.
在我的網站完全實現此功能之前,我有一些問題:
Before I fully implement this funcionality to my website, I have some questions:
1- 是否可以提前確定索引的大小?這是因為在 Zend 手冊中它說索引的最大大小是 2GB.我立刻認為這對我的桌子來說還不夠!
1- Is it possible to determine the size of a index in advance? This because in the Zend manual it says the max size of a index is 2GB. I am straight away thinking that isn't enough for my table!
2- 我讀過一些帖子,他們說 Zend Lucene 搜索在大型索引上非常慢,最多幾分鐘!直接使用 mysql 命令(SELECT、LIKE 等)而不是 zend 會更快嗎?
2- I have read posts where they say Zend Lucene search is very slow on large indexes, up to minutes! Is it faster to use mysql commands directly (SELECT, LIKE etc) instead of zend?
3- 是否有其他解決方案可以解決我的問題,即為具有 這些功能至少,并且不需要全文mysql索引(字段).
3- Is there any other solutions to my problem which is to create a search engine for classifieds which has these functions atleast, and doesn't require full-text mysql indexes (fields).
謝謝
推薦答案
SOLR 基本上是一個 Apache Tomcat 容器,它實現了一個 REST 接口來查詢 Apache Lucene 索引.是的,您需要能夠在您的 Web 服務器上運行 Java 應用程序.這是您需要與您的托管服務提供商解決的問題.
SOLR is basically an Apache Tomcat container that implements a REST interface to query an Apache Lucene index. Yes, you need to be able to run a Java application on your web server. This is an issue for you to work out with your hosting provider.
使用您的網絡應用程序的客戶端不需要運行 Java.您的 PHP 應用程序可以對 SOLR 服務進行 REST 查詢,并將結果格式化為 HTML.客戶端只能看到 HTML 輸出;它永遠不需要知道數據來自用 Java 實現的服務.
Clients using your web app don't need to run Java. Your PHP app could make a REST query to the SOLR service and format the results in HTML. A client sees only the HTML output; it never needs to know that the data came from a service implemented in Java.
Zend_Search_Lucene
是一個純 PHP 實現,應該與 Apache Lucene 的工作方式相同.Zend 解決方案甚至使用相同的索引文件格式.所以在存儲方面它們應該是相等的.
Zend_Search_Lucene
is a pure-PHP implementation that is supposed to work identically to Apache Lucene. The Zend solution even uses an identical index file format. So storage-wise they should be equal.
我使用 Java Lucene 為 StackOverflow 數據轉儲(2009 年 10 月)建立索引.我索引了 150 萬行,包括大約 1 演出的文本數據.Lucene索引是1323MB,而同樣數據的MySQL FULLTEXT索引只有466MB.
I used Java Lucene to index the StackOverflow data dump (October 2009). I indexed 1.5 million rows, including about 1 gig of text data. The Lucene index was 1323 MB, whereas the MySQL FULLTEXT index of the same data was only 466 MB.
使用 SQL LIKE
謂詞代替任何全文索引解決方案當然不需要空間,因為它無論如何都不能使用常規索引.但是在我使用 LIKE
的測試中,它比 Java Lucene 慢了大約 200 倍,而 Java Lucene 又比相同數據上的 MySQL FULLTEXT 索引慢了大約 40%.
Using SQL LIKE
predicates in lieu of any fulltext indexing solution requires no space of course, because it cannot make use of a conventional index anyway. But in my tests using LIKE
was about 200 times slower than Java Lucene, which was in turn about 40% slower than a MySQL FULLTEXT index on the same data.
查看我最近關于 MySQL 全文索引解決方案的演示:
See my recent presentation about fulltext indexing solutions with MySQL:
http://www.slideshare.net/billkarwin/practical-full-text-search-with-my-sql
它無法與 Java Lucene 技術的性能和可擴展性相媲美,這并不奇怪.PHP 作為一種語言的優勢在于提高了開發效率,而不是運行時效率.
It's not surprising that it can't match the performance and scalability of the Java Lucene technology. PHP's advantage as a language is increasing development efficiency, not runtime efficiency.
更新:我剛剛嘗試使用 Zend_Search_Lucene
創建索引.使用 PHP 創建索引比使用 Java Lucene 技術慢得多,所以我只索引了 10,000 個文檔.這花了將近 15 分鐘,這將使索引整個集合需要大約 36 小時.將此與 Java Lucene 進行比較,Java Lucene 在我的測試中在 7 分鐘內索引了 150 萬個文檔的完整集合.
update: I just tried creating an index using Zend_Search_Lucene
. Creating an index is far slower with PHP than with the Java Lucene technology, so I only indexed 10,000 documents. This took almost 15 minutes, which would make it take about 36 hours to index the whole collection. Compare this to Java Lucene, which in my test indexed the full collection of 1.5 million documents in under 7 minutes.
我使用 Zend_Search_Lucene
創建的索引大小為 8.75 MB.推斷這個 150 倍,我估計完整索引將是 1312.5 MB.所以我得出結論,Zend_Search_Lucene
創建的索引與 Java Lucene 生成的索引大小大致相同.這符合預期.
The size of the index I created with Zend_Search_Lucene
is 8.75 MB. Extrapolating this 150x, I estimate the full index would be 1312.5 MB. So I conclude that Zend_Search_Lucene
creates an index of about the same size as the index produced by Java Lucene. This is as expected.
這篇關于我可以預測我的 Zend Framework 索引有多大嗎?(以及一些快速的 Q:s)的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網!