2015年6月25日 星期四

[elasticsearch] Elasticsearch for Apache Hadoop 2.1 GA: Spark, Storm and More | Elastic



昨天 release 了新的 Elasticsearch for Apache Hadoop 2.1

Elasticsearch for Apache Hadoop 也就是大家熟知的 ES-Hadoop 主要是替 hadoop 的使用者加上一個具有搜尋跟分析的引擎。

這個版本有四個beta版本再加上一個RC1的版本,在過去的十個月內,不斷的驗證測試。

2.1 涵蓋整個 hadoop eco system中,特別是受關注的,real time components (apache spark and storm)。除了支援SSL/TLS 還支援 HTTP 與 PKI 的認證。
並且保有向後兼容特性,並且只需要下載一個jar檔,沒有其他額外的依賴。


Native Integration with Spark and Spark SQL

Storm Integration

Security enhancements

Elasticsearch on YARN


Enhanced Functionality

2.1 introduces a number of enhancements to existing features such as:
  • JSON results - Documents from Elasticsearch are returned in raw format, as JSON documents. (As an implementation note the data from Elasticsearch is passed to the client in verbatim form, without any processing).
  • Document metadata - In addition to its content, a document's metadata can be now returned without any extra network cost.
  • Inclusion/Exclusion of fields - On the mapping front, it is now possible to specify what fields to be included or excluded for data about to be written to Elasticsearch. This makes it quite handy not only for doing quick transformation of the data but also specifying document metadata without storing it.
  • Client-node routing - For clusters in restrained environments, it is now possible to use the connector through client nodes only. That is, rather than accessing the cluster data nodes directly, the connector will use the client nodes instead (which do need to have the HTTP(S) port opened) and ask those to do the work on its behalf. This change will impact parallelism as the connector will not communicate directly with the nodes. However, the performance penalty is insignificant unless a lot of data is read/written and when locality is not of importance.



ref:
Elasticsearch for Apache Hadoop 2.1 GA: Spark, Storm and More | Elastic
https://www.elastic.co/blog/elasticsearch-for-apache-hadoop-2-1-spark-storm-and-more


沒有留言:

張貼留言