2014年5月27日 星期二

[elasticsearch]且談 _all field 與 query default field




在談 "_all" 這個field 前 ,先看 elasitcserch 內的 default 的 search field。

當在使用 search 時,如果你的 query syntax 沒有指名說你要 search 哪個 field 時,預設的情況下,會 search 的欄位就是 "_all" 這個欄位就是預設的 default field。

(可以在 index.query.default_field 更改)

_all 這個欄位,其實就是 doc 欄位的集合。

如果你的應用是,document裡面的欄位都必須當成一個完整的term,也就是說,就算他們包含空白,在做 facet 呈現時也不希望把他們斷開成兩個字。

(還有另外一種處理方式就是使用 multi field ,也就是說從 raw data 索引後,可能有兩個以上的欄位表示同一個欄位,然後各自可以採用不同的 analyzer 跟 tokenizer 。)

這時候就可以充分的利用 _all 這個欄位的 analyzer 的設定。

PUT http://es_url/INDEX/
{
  "settings": {
    "analysis": {
      "analyzer": {
        "default": {
          "tokenizer": "keyword",
          "filter": "lowercase"
        }
      }
    }
  },
  "mappings": {
    "logs": {
      "_all": {
        "type": "string",
        "index": "analyzed",
        "analyzer": "standard"
      }
    }
  }
}

此處的 logs 表示的是  document type,可依照需求置換。

透過這樣的設定你就可以得到 search 欄位裡面的 結果,但是又有正確的 facet 效果。



kimchy
You don't want to set the analyzer for _all to be keyword, _all is an aggregation of all the other fields int the doc, so you basically treat the whole aggregation of text as a single token.
from ElasticSearch Users - Specifying analyzer for _all field
http://elasticsearch-users.115913.n3.nabble.com/Specifying-analyzer-for-all-field-td3851732.html

Its a copy of all the fields "aggregated" into the _all field.


_all
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-all-field.html




[elasticsearch] _source field 搜尋結果如何不列出欄位




在使用elasticsearch 時, "_source" 這個然後會根據你 input 進來 index 時的document 儲存下來。

default 的情況下,會把你索引的docs的每個欄位都儲存下來。

這時候可能有些情況,
我只想索引他,也就使用query string做 query時,查找的到這個 document ,但是不想在_source 列出該欄位內容呢?

這時候可以使用  _source 提供的 includes ,excludes

{
    "my_type" : {
        "_source" : {
            "includes" : ["path1.*", "path2.*"],
            "excludes" : ["pat3.*"]
        }
    }
}

舉個實際例子來看,當你索引一個type為 logs的document 時,你想要讓 tag 這欄位是可以被搜尋的,但是不想讓他show 到 hit的 source內。

PUT http://localhost:9200/INDEX_NAME/
{
  "settings": {
    "analysis": {
      "analyzer": {
        "default": {
          "tokenizer": "keyword",
          "filter": "lowercase"
        }
      }
    }
  },
  "mappings": {
    "logs": {
      "_all": {
        "type": "string",
        "index": "analyzed",
        "analyzer": "standard"
      },
      "_source": {
        "excludes": [
          "tag"
        ]
      }
    }
  }
}

ref
_source
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-source-field.html

2014年5月26日 星期一

[elasticsearch]Exception in thread "main" java.lang.UnsupportedClassVersionError: org/elasticsearch/bootstrap/Elasticsearch : Unsupported major.minor version 51.0

root@renode2[/opt]{00:04}# rpm -ivh elasticsearch-1.2.0.noarch.rpm
正在準備…             ########################################### [100%]
   1:elasticsearch          ########################################### [100%]
### NOT starting on installation, please execute the following statements to configure elasticsearch to start automatically using chkconfig
 sudo /sbin/chkconfig --add elasticsearch
### You can start elasticsearch by executing
 sudo service elasticsearch start
root@renode2[/opt]{00:05}# /etc/init.d/elasticsearch start
正在啟動 elasticsearch:                                   [  確定  ]
root@renode2[/opt]{00:05}# Exception in thread "main" java.lang.UnsupportedClassVersionError: org/elasticsearch/bootstrap/Elasticsearch : Unsupported major.minor version 51.0
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClassCond(ClassLoader.java:631)
at java.lang.ClassLoader.defineClass(ClassLoader.java:615)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:283)
at java.net.URLClassLoader.access$000(URLClassLoader.java:58)
at java.net.URLClassLoader$1.run(URLClassLoader.java:197)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
Could not find the main class: org.elasticsearch.bootstrap.Elasticsearch.  Program will exit.


rpm 使用 jdk 7 編譯,但是jre還是使用 java 6

root@renode2[/opt]{00:07}# java -version
java version "1.6.0_31"
Java(TM) SE Runtime Environment (build 1.6.0_31-b04)
Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01, mixed mode)

update

Elasticsearch.org Elasticsearch 1.2.0 And 1.1.2 Released | Blog | Elasticsearch
http://www.elasticsearch.org/blog/elasticsearch-1-2-0-released/

Elasticsearch在 1.2.0 以後的版本將不在支援 Java 6 推荐的版本的 JDK 7u55 JDK7u25


2014年5月20日 星期二

[elasticsearch]using skywalker read elasticsearch lucene index information/ luke 查看索引訊息


[elasticsearch]using skywalker read elasticsearch lucene index information/ luke 查看索引訊息

elasitcsearch 的底層使用 Lucene 來當 index kernel ,換句話說在elasticsearch 裡面看到的 segment 其實就是 Lucene的 index format。

有一些情況,我們可能想得知索引檔案內有多少terms (即numTerms)
一般可以使用  Luke 來觀察 Lucene 索引內的一些資訊。

在 elasticsearch 上,甚至可以直接使用 skywalker (Skywalker for Elasticsearch is like Luke for Lucene) 來查詢一些底層的資訊。

skywalker (Skywalker for Elasticsearch is like Luke for Lucene) 

Install
安裝 elasticsearch的 plugins通常非常容易

ES versionPluginRelease dateCommand
0.90.03.0.0May 24, 2013./bin/plugin --install skywalker --url http://bit.ly/1eYTIHj
0.90.53.1.0Nov 9, 2013./bin/plugin --install skywalker --url http://bit.ly/HFJos6
0.90.63.2.0Nov 9, 2013./bin/plugin --install skywalker --url http://bit.ly/19PbcoJ
1.0.0.RC11.0.0.RC1.1Jan 16, 2014./bin/plugin --install skywalker --url http://bit.ly/1htPlFK
Do not forget to restart the node after installing.
最重要的在使用skywalker 必須在安裝完 plugins後 restart 你的nodes。


直接訪問 _skywalker
http://localhost:9200/_skywalker

查看 index file 的 internal information
"0": {"numTerms": 236527,"hasDeletions": false,"directoryImpl": "org.elasticsearch.index.store.Store$StoreDirectory","indexFormat": {"capabilities": "flexible, codec-specific (WARNING: newer version of Lucene than this tool)","genericName": "Lucene 4.1","version": "4.1"},"numDocs": 38982,"maxDoc": 38982,"indexVersion": "3104","numDeletedDocs": 0},
透過 elasticsearch-skywalker 我們可以得知 像是該 索引檔案有多少 Terms ,topterms是哪些,他們被 analyzer 分析成怎樣的 term store。
經由這些資訊,我們可以更加了解我們的 搜尋引擎 的內部。

ref
jprante/elasticsearch-skywalker
https://github.com/jprante/elasticsearch-skywalker

2014年5月18日 星期日

[武術] 記 140518



今天學到一個很重要的概念 就是 "膽識" 需要透過不同層面來實踐。


  • 四個基本的散手訓練
  • 有裡子的八極,必須要有功底。
  • 大槍是最好的一個八極捅勁的練法。
  • 八極拳的熊形 虎形 龍形
  • 自古八極不上擂

2014年5月15日 星期四

[elasticsearch]列出 node 裝有哪些 plugins與狀態 / List of existing plugins with Node Info API


管理 Elasticsearch 我們想知道每個node的狀態與他們有裝置哪些 plugins。

這時候可以使用 api 來 list existing plugins與 node 的資訊:

curl http://localhost:9200/_nodes?plugin=true
response

{"cluster_name": "elasticsearchpc2","nodes": {"Esuo81ErRqCNNmCbJfPMxQ": {"name": "renode2","transport_address": "inet[/10.1.191.1:9300]","host": "renode2","ip": "10.1.191.1","version": "1.1.1","build": "f1585f0","http_address": "inet[/10.1.191.1:9200]","settings": {"path": {"logs": "/root/elasticsearch-1.1.1/logs","home": "/root/elasticsearch-1.1.1"},"cluster": {"name": "elasticsearchpc2"},"node": {"name": "renode2"},"discovery": {"zen": {"ping": {"unicast": {"hosts": ["renode1","renode2"]},"multicast": {"enabled": "false"}}}},"foreground": "yes","name": "renode2"},"os": {"refresh_interval": 1000,"available_processors": 8,"cpu": {"vendor": "Intel","model": "Xeon","mhz": 2526,"total_cores": 8,"total_sockets": 8,"cores_per_socket": 1,"cache_size_in_bytes": 8192},"mem": {"total_in_bytes": 6029754368},"swap": {"total_in_bytes": 4227850240}},"process": {"refresh_interval": 1000,"id": 18017,"max_file_descriptors": 4096,"mlockall": false},"jvm": {"pid": 18017,"version": "1.6.0_31","vm_name": "Java HotSpot(TM) 64-Bit Server VM","vm_version": "20.6-b01","vm_vendor": "Sun Microsystems Inc.","start_time": 1400126620513,"mem": {"heap_init_in_bytes": 268435456,"heap_max_in_bytes": 1060372480,"non_heap_init_in_bytes": 24313856,"non_heap_max_in_bytes": 136314880,"direct_max_in_bytes": 1060372480},"gc_collectors": ["ParNew","ConcurrentMarkSweep"],"memory_pools": ["Code Cache","Par Eden Space","Par Survivor Space","CMS Old Gen","CMS Perm Gen"]},"thread_pool": {"generic": {"type": "cached","keep_alive": "30s"},"index": {"type": "fixed","min": 8,"max": 8,"queue_size": "200"},"get": {"type": "fixed","min": 8,"max": 8,"queue_size": "1k"},"snapshot": {"type": "scaling","min": 1,"max": 4,"keep_alive": "5m"},"merge": {"type": "scaling","min": 1,"max": 4,"keep_alive": "5m"},"suggest": {"type": "fixed","min": 8,"max": 8,"queue_size": "1k"},"bulk": {"type": "fixed","min": 8,"max": 8,"queue_size": "50"},"optimize": {"type": "fixed","min": 1,"max": 1},"warmer": {"type": "scaling","min": 1,"max": 4,"keep_alive": "5m"},"flush": {"type": "scaling","min": 1,"max": 4,"keep_alive": "5m"},"search": {"type": "fixed","min": 24,"max": 24,"queue_size": "1k"},"percolate": {"type": "fixed","min": 8,"max": 8,"queue_size": "1k"},"management": {"type": "scaling","min": 1,"max": 5,"keep_alive": "5m"},"refresh": {"type": "scaling","min": 1,"max": 4,"keep_alive": "5m"}},"network": {"refresh_interval": 5000,"primary_interface": {"address": "10.1.191.1","name": "eth0","mac_address": "FA:3E:B2:87:97:5s"}},"transport": {"bound_address": "inet[/0:0:0:0:0:0:0:0%0:9300]","publish_address": "inet[/10.1.191.1:9300]"},"http": {"bound_address": "inet[/0:0:0:0:0:0:0:0%0:9200]","publish_address": "inet[/10.1.191.1:9200]","max_content_length_in_bytes": 104857600},"plugins": []},}}
a