SolrCloud 改善了原本在 Solr 分散式上面的不足。
Solr 在之前沒有distributed index的feature,我們需要手動的去拆分 core to shards,
你要自己知道你要index的record file 是送到哪一台node上面做index,Solr並不會幫你管理這些步驟。
而且,當你的每個core不 balance 時,也要手動來結果。而且不支援 failover 。可能會導致某些 index 找不到的情況。
SolrCloud 引入 ZooKeeper
在整個系統中導入了hadoop eco 常用的 居中協調角色的 ZooKeeper來做 failover 與 load balancing。讓整個 Search engine 可以更 Robust。
There were, however, several problems with the distributed approach that necessitated improvement with SolrCloud:
- Splitting of the core into shards was somewhat manual.
- There was no support for distributed indexing, which meant that you needed to explicitly send documents to a specific shard; Solr couldn't figure out on its own what shards to send documents to.
- There was no load balancing or failover, so if you got a high number of queries, you needed to figure out where to send them and if one shard died it was just gone.
SolrCloud fixes all those problems. There is support for distributing both the index process and the queries automatically, and ZooKeeper provides failover and load balancing. Additionally, every shard can also have multiple replicas for additional robustness.
Shards and Indexing Data in SolrCloud - Apache Solr Reference Guide - Apache Software Foundation
https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+in+SolrCloud
沒有留言:
張貼留言