ElasticSearch 6.x 学习笔记:4.IK分词器插件

ElasticSearch 6.x 学习笔记:4.IK分词器插件

4.1 elasticsearch-analysis-ik 6.1.1(1)源码

https://github.com/medcl/elasticsearch-analysis-ik

(2)releases

https://github.com/medcl/elasticsearch-analysis-ik/releases

(3)复制zip地址

https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v6.1.1/elasticsearch-analysis-ik-6.1.1.zip

4.2 安装插件(1)elasticsearch-plugin

代码语言:javascript复制[es@node1 elasticsearch-6.1.1]$ bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v6.1.1/elasticsearch-analysis-ik-6.1.1.zip

-> Downloading https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v6.1.1/elasticsearch-analysis-ik-6.1.1.zip

[=================================================] 100%

-> Installed analysis-ik

[es@node1 elasticsearch-6.1.1]$ ll plugins/

total 0

drwxr-xr-x 2 es es 199 Jan 7 08:52 analysis-ik

[es@node1 elasticsearch-6.1.1]$ (2)查看目录

代码语言:javascript复制[es@node1 elasticsearch-6.1.1]$ ll plugins/analysis-ik/

total 1420

-rw-r--r-- 1 es es 263965 Jan 7 08:52 commons-codec-1.9.jar

-rw-r--r-- 1 es es 61829 Jan 7 08:52 commons-logging-1.2.jar

-rw-r--r-- 1 es es 51658 Jan 7 08:52 elasticsearch-analysis-ik-6.1.1.jar

-rw-r--r-- 1 es es 736658 Jan 7 08:52 httpclient-4.5.2.jar

-rw-r--r-- 1 es es 326724 Jan 7 08:52 httpcore-4.4.4.jar

-rw-r--r-- 1 es es 2666 Jan 7 08:52 plugin-descriptor.properties

[es@node1 elasticsearch-6.1.1]$ (3)如果ES集群不方便联网,可以通过下面方法安装IK(2018-04-09更新)

首先,在Windows上下载elasticsearch-analysis-ik-6.1.1.zip;

其次,上传到ES集群,解压缩;

然后,将解压缩目录移动到ES的plugins目录

最后,重启ES。

代码语言:javascript复制[elastic@node1 ~]$ unzip elasticsearch-analysis-ik-6.2.3.zip

[elastic@node1 ~]$ mv elasticsearch /opt/elasticsearch-6.2.3/plugins/ik

[elastic@node1 ~]$ ll /opt/elasticsearch-6.2.3/plugins/

total 4

drwxrwxrwx 3 elastic elastic 4096 Apr 2 12:16 ik

[elastic@ndoe1 ~]$然后复制到其他节点

代码语言:javascript复制[elastic@node1 plugins]$ scp -r ik elastic@node2:/opt/elasticsearch-6.2.3/plugins/4.3 重启elasticsearch代码语言:javascript复制[es@node1 elasticsearch-6.1.1]$ bin/elasticsearch

[2018-01-07T09:01:17,283][INFO ][o.e.n.Node ] [] initializing ...

[2018-01-07T09:01:17,421][INFO ][o.e.e.NodeEnvironment ] [cNWkQjt] using [1] data paths, mounts [[/ (rootfs)]], net usable_space [14.3gb], net total_space [21.9gb], types [rootfs]

[2018-01-07T09:01:17,422][INFO ][o.e.e.NodeEnvironment ] [cNWkQjt] heap size [1007.3mb], compressed ordinary object pointers [true]

[2018-01-07T09:01:17,484][INFO ][o.e.n.Node ] node name [cNWkQjt] derived from node ID [cNWkQjt9SzKFNtyx8IIu-A]; set [node.name] to override

[2018-01-07T09:01:17,484][INFO ][o.e.n.Node ] version[6.1.1], pid[3445], build[bd92e7f/2017-12-17T20:23:25.338Z], OS[Linux/3.10.0-514.el7.x86_64/amd64], JVM[Oracle Corporation/Java HotSpot(TM) 64-Bit Server VM/1.8.0_112/25.112-b15]

[2018-01-07T09:01:17,485][INFO ][o.e.n.Node ] JVM arguments [-Xms1g, -Xmx1g, -XX:+UseConcMarkSweepGC, -XX:CMSInitiatingOccupancyFraction=75, -XX:+UseCMSInitiatingOccupancyOnly, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -XX:-OmitStackTraceInFastThrow, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -XX:+HeapDumpOnOutOfMemoryError, -Des.path.home=/opt/elasticsearch-6.1.1, -Des.path.conf=/opt/elasticsearch-6.1.1/config]

[2018-01-07T09:01:19,000][INFO ][o.e.p.PluginsService ] [cNWkQjt] loaded module [aggs-matrix-stats]

[2018-01-07T09:01:19,000][INFO ][o.e.p.PluginsService ] [cNWkQjt] loaded module [analysis-common]

[2018-01-07T09:01:19,000][INFO ][o.e.p.PluginsService ] [cNWkQjt] loaded module [ingest-common]

[2018-01-07T09:01:19,001][INFO ][o.e.p.PluginsService ] [cNWkQjt] loaded module [lang-expression]

[2018-01-07T09:01:19,001][INFO ][o.e.p.PluginsService ] [cNWkQjt] loaded module [lang-mustache]

[2018-01-07T09:01:19,001][INFO ][o.e.p.PluginsService ] [cNWkQjt] loaded module [lang-painless]

[2018-01-07T09:01:19,001][INFO ][o.e.p.PluginsService ] [cNWkQjt] loaded module [mapper-extras]

[2018-01-07T09:01:19,001][INFO ][o.e.p.PluginsService ] [cNWkQjt] loaded module [parent-join]

[2018-01-07T09:01:19,002][INFO ][o.e.p.PluginsService ] [cNWkQjt] loaded module [percolator]

[2018-01-07T09:01:19,002][INFO ][o.e.p.PluginsService ] [cNWkQjt] loaded module [reindex]

[2018-01-07T09:01:19,002][INFO ][o.e.p.PluginsService ] [cNWkQjt] loaded module [repository-url]

[2018-01-07T09:01:19,002][INFO ][o.e.p.PluginsService ] [cNWkQjt] loaded module [transport-netty4]

[2018-01-07T09:01:19,002][INFO ][o.e.p.PluginsService ] [cNWkQjt] loaded module [tribe]

[2018-01-07T09:01:19,003][INFO ][o.e.p.PluginsService ] [cNWkQjt] loaded plugin [analysis-ik]

[2018-01-07T09:01:21,678][INFO ][o.e.d.DiscoveryModule ] [cNWkQjt] using discovery type [zen]

[2018-01-07T09:01:22,567][INFO ][o.e.n.Node ] initialized

[2018-01-07T09:01:22,568][INFO ][o.e.n.Node ] [cNWkQjt] starting ...

[2018-01-07T09:01:22,803][INFO ][o.e.t.TransportService ] [cNWkQjt] publish_address {192.168.80.131:9300}, bound_addresses {192.168.80.131:9300}

[2018-01-07T09:01:22,837][INFO ][o.e.b.BootstrapChecks ] [cNWkQjt] bound or publishing to a non-loopback or non-link-local address, enforcing bootstrap checks

[2018-01-07T09:01:25,940][INFO ][o.e.c.s.MasterService ] [cNWkQjt] zen-disco-elected-as-master ([0] nodes joined), reason: new_master {cNWkQjt}{cNWkQjt9SzKFNtyx8IIu-A}{Xvho5gpPTuavakz227C_uA}{192.168.80.131}{192.168.80.131:9300}

[2018-01-07T09:01:25,949][INFO ][o.e.c.s.ClusterApplierService] [cNWkQjt] new_master {cNWkQjt}{cNWkQjt9SzKFNtyx8IIu-A}{Xvho5gpPTuavakz227C_uA}{192.168.80.131}{192.168.80.131:9300}, reason: apply cluster state (from master [master {cNWkQjt}{cNWkQjt9SzKFNtyx8IIu-A}{Xvho5gpPTuavakz227C_uA}{192.168.80.131}{192.168.80.131:9300} committed version [1] source [zen-disco-elected-as-master ([0] nodes joined)]])

[2018-01-07T09:01:25,993][INFO ][o.e.h.n.Netty4HttpServerTransport] [cNWkQjt] publish_address {192.168.80.131:9200}, bound_addresses {192.168.80.131:9200}

[2018-01-07T09:01:25,993][INFO ][o.e.n.Node ] [cNWkQjt] started

[2018-01-07T09:01:26,077][INFO ][o.w.a.d.Monitor ] try load config from /opt/elasticsearch-6.1.1/config/analysis-ik/IKAnalyzer.cfg.xml

[2018-01-07T09:01:26,799][INFO ][o.e.g.GatewayService ] [cNWkQjt] recovered [2] indices into cluster_state

[2018-01-07T09:01:27,526][INFO ][o.e.c.r.a.AllocationService] [cNWkQjt] Cluster health status changed from [RED] to [YELLOW] (reason: [shards started [[test][2], [test][0]] ...]).4.3 测试IK中文分词器的基本功能(1)ik_smart

其中pretty本意”漂亮的”,表示以美观的形式打印出JSON格式响应。

代码语言:javascript复制GET _analyze?pretty

{

"analyzer": "ik_smart",

"text":"安徽省长江流域"

}分词结果

代码语言:javascript复制{

"tokens": [

{

"token": "安徽省",

"start_offset": 0,

"end_offset": 3,

"type": "CN_WORD",

"position": 0

},

{

"token": "长江流域",

"start_offset": 3,

"end_offset": 7,

"type": "CN_WORD",

"position": 1

}

]

}

(2)ik_max_word

代码语言:javascript复制GET _analyze?pretty

{

"analyzer": "ik_max_word",

"text":"安徽省长江流域"

}分词结果

代码语言:javascript复制{

"tokens": [

{

"token": "安徽省",

"start_offset": 0,

"end_offset": 3,

"type": "CN_WORD",

"position": 0

},

{

"token": "安徽",

"start_offset": 0,

"end_offset": 2,

"type": "CN_WORD",

"position": 1

},

{

"token": "省长",

"start_offset": 2,

"end_offset": 4,

"type": "CN_WORD",

"position": 2

},

{

"token": "长江流域",

"start_offset": 3,

"end_offset": 7,

"type": "CN_WORD",

"position": 3

},

{

"token": "长江",

"start_offset": 3,

"end_offset": 5,

"type": "CN_WORD",

"position": 4

},

{

"token": "江流",

"start_offset": 4,

"end_offset": 6,

"type": "CN_WORD",

"position": 5

},

{

"token": "流域",

"start_offset": 5,

"end_offset": 7,

"type": "CN_WORD",

"position": 6

}

]

}(3)新词

代码语言:javascript复制GET _analyze?pretty

{

"analyzer": "ik_smart",

"text": "王者荣耀"

}分词结果

代码语言:javascript复制{

"tokens": [

{

"token": "王者",

"start_offset": 0,

"end_offset": 2,

"type": "CN_WORD",

"position": 0

},

{

"token": "荣耀",

"start_offset": 2,

"end_offset": 4,

"type": "CN_WORD",

"position": 1

}

]

}4.4 扩展字典(1)查看已有词典

代码语言:javascript复制[es@node1 analysis-ik]$ pwd

/opt/elasticsearch-6.1.1/config/analysis-ik

[es@node1 analysis-ik]$ ll

total 8260

-rw-rw---- 1 es bigdata 5225922 Jan 7 08:52 extra_main.dic

-rw-rw---- 1 es bigdata 63188 Jan 7 08:52 extra_single_word.dic

-rw-rw---- 1 es bigdata 63188 Jan 7 08:52 extra_single_word_full.dic

-rw-rw---- 1 es bigdata 10855 Jan 7 08:52 extra_single_word_low_freq.dic

-rw-rw---- 1 es bigdata 156 Jan 7 08:52 extra_stopword.dic

-rw-rw---- 1 es bigdata 625 Jan 7 08:52 IKAnalyzer.cfg.xml

-rw-rw---- 1 es bigdata 3058510 Jan 7 08:52 main.dic

-rw-rw---- 1 es bigdata 123 Jan 7 08:52 preposition.dic

-rw-rw---- 1 es bigdata 1824 Jan 7 08:52 quantifier.dic

-rw-rw---- 1 es bigdata 164 Jan 7 08:52 stopword.dic

-rw-rw---- 1 es bigdata 192 Jan 7 08:52 suffix.dic

-rw-rw---- 1 es bigdata 752 Jan 7 08:52 surname.dic

[es@node1 analysis-ik]$(2)自定义词典

代码语言:javascript复制[es@node1 analysis-ik]$ mkdir custom

[es@node1 analysis-ik]$ vi custom/new_word.dic

[es@node1 analysis-ik]$ cat custom/new_word.dic

老铁

王者荣耀

洪荒之力

共有产权房

一带一路

[es@node1 analysis-ik]$ (3)更新配置

代码语言:javascript复制[es@node1 analysis-ik]$ vi IKAnalyzer.cfg.xml

[es@node1 analysis-ik]$ cat IKAnalyzer.cfg.xml



IK Analyzer 扩展配置

custom/new_word.dic

[es@node1 analysis-ik]$(4)重启elasticsearch

代码语言:javascript复制[es@node1 elasticsearch-6.1.1]$ bin/elasticsearch

[2018-01-07T10:00:23,032][INFO ][o.e.n.Node ] [] initializing ...

[2018-01-07T10:00:23,170][INFO ][o.e.e.NodeEnvironment ] [cNWkQjt] using [1] data paths, mounts [[/ (rootfs)]], net usable_space [14.3gb], net total_space [21.9gb], types [rootfs]

[2018-01-07T10:00:23,171][INFO ][o.e.e.NodeEnvironment ] [cNWkQjt] heap size [1007.3mb], compressed ordinary object pointers [true]

[2018-01-07T10:00:23,209][INFO ][o.e.n.Node ] node name [cNWkQjt] derived from node ID [cNWkQjt9SzKFNtyx8IIu-A]; set [node.name] to override

[2018-01-07T10:00:23,210][INFO ][o.e.n.Node ] version[6.1.1], pid[3574], build[bd92e7f/2017-12-17T20:23:25.338Z], OS[Linux/3.10.0-514.el7.x86_64/amd64], JVM[Oracle Corporation/Java HotSpot(TM) 64-Bit Server VM/1.8.0_112/25.112-b15]

[2018-01-07T10:00:23,210][INFO ][o.e.n.Node ] JVM arguments [-Xms1g, -Xmx1g, -XX:+UseConcMarkSweepGC, -XX:CMSInitiatingOccupancyFraction=75, -XX:+UseCMSInitiatingOccupancyOnly, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -XX:-OmitStackTraceInFastThrow, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -XX:+HeapDumpOnOutOfMemoryError, -Des.path.home=/opt/elasticsearch-6.1.1, -Des.path.conf=/opt/elasticsearch-6.1.1/config]

[2018-01-07T10:00:24,717][INFO ][o.e.p.PluginsService ] [cNWkQjt] loaded module [aggs-matrix-stats]

[2018-01-07T10:00:24,717][INFO ][o.e.p.PluginsService ] [cNWkQjt] loaded module [analysis-common]

[2018-01-07T10:00:24,718][INFO ][o.e.p.PluginsService ] [cNWkQjt] loaded module [ingest-common]

[2018-01-07T10:00:24,718][INFO ][o.e.p.PluginsService ] [cNWkQjt] loaded module [lang-expression]

[2018-01-07T10:00:24,718][INFO ][o.e.p.PluginsService ] [cNWkQjt] loaded module [lang-mustache]

[2018-01-07T10:00:24,718][INFO ][o.e.p.PluginsService ] [cNWkQjt] loaded module [lang-painless]

[2018-01-07T10:00:24,718][INFO ][o.e.p.PluginsService ] [cNWkQjt] loaded module [mapper-extras]

[2018-01-07T10:00:24,719][INFO ][o.e.p.PluginsService ] [cNWkQjt] loaded module [parent-join]

[2018-01-07T10:00:24,719][INFO ][o.e.p.PluginsService ] [cNWkQjt] loaded module [percolator]

[2018-01-07T10:00:24,719][INFO ][o.e.p.PluginsService ] [cNWkQjt] loaded module [reindex]

[2018-01-07T10:00:24,719][INFO ][o.e.p.PluginsService ] [cNWkQjt] loaded module [repository-url]

[2018-01-07T10:00:24,719][INFO ][o.e.p.PluginsService ] [cNWkQjt] loaded module [transport-netty4]

[2018-01-07T10:00:24,720][INFO ][o.e.p.PluginsService ] [cNWkQjt] loaded module [tribe]

[2018-01-07T10:00:24,720][INFO ][o.e.p.PluginsService ] [cNWkQjt] loaded plugin [analysis-ik]

[2018-01-07T10:00:27,866][INFO ][o.e.d.DiscoveryModule ] [cNWkQjt] using discovery type [zen]

[2018-01-07T10:00:28,794][INFO ][o.e.n.Node ] initialized

[2018-01-07T10:00:28,795][INFO ][o.e.n.Node ] [cNWkQjt] starting ...

[2018-01-07T10:00:29,047][INFO ][o.e.t.TransportService ] [cNWkQjt] publish_address {192.168.80.131:9300}, bound_addresses {192.168.80.131:9300}

[2018-01-07T10:00:29,093][INFO ][o.e.b.BootstrapChecks ] [cNWkQjt] bound or publishing to a non-loopback or non-link-local address, enforcing bootstrap checks

[2018-01-07T10:00:32,210][INFO ][o.e.c.s.MasterService ] [cNWkQjt] zen-disco-elected-as-master ([0] nodes joined), reason: new_master {cNWkQjt}{cNWkQjt9SzKFNtyx8IIu-A}{N6t0NiDmQp2vlrbx-FtcUQ}{192.168.80.131}{192.168.80.131:9300}

[2018-01-07T10:00:32,217][INFO ][o.e.c.s.ClusterApplierService] [cNWkQjt] new_master {cNWkQjt}{cNWkQjt9SzKFNtyx8IIu-A}{N6t0NiDmQp2vlrbx-FtcUQ}{192.168.80.131}{192.168.80.131:9300}, reason: apply cluster state (from master [master {cNWkQjt}{cNWkQjt9SzKFNtyx8IIu-A}{N6t0NiDmQp2vlrbx-FtcUQ}{192.168.80.131}{192.168.80.131:9300} committed version [1] source [zen-disco-elected-as-master ([0] nodes joined)]])

[2018-01-07T10:00:32,285][INFO ][o.e.h.n.Netty4HttpServerTransport] [cNWkQjt] publish_address {192.168.80.131:9200}, bound_addresses {192.168.80.131:9200}

[2018-01-07T10:00:32,286][INFO ][o.e.n.Node ] [cNWkQjt] started

[2018-01-07T10:00:32,326][INFO ][o.w.a.d.Monitor ] try load config from /opt/elasticsearch-6.1.1/config/analysis-ik/IKAnalyzer.cfg.xml

[2018-01-07T10:00:32,905][INFO ][o.w.a.d.Monitor ] [Dict Loading] custom/new_word.dic

[2018-01-07T10:00:33,279][INFO ][o.e.g.GatewayService ] [cNWkQjt] recovered [2] indices into cluster_state

[2018-01-07T10:00:34,092][INFO ][o.e.c.r.a.AllocationService] [cNWkQjt] Cluster health status changed from [RED] to [YELLOW] (reason: [shards started [[test][3]] ...]).从输出信息中可以看到

代码语言:javascript复制[Dict Loading] custom/new_word.dic说明自定义词典已经加载了。

(5)重启Kibana

重启Kibana后,从新执行下面命令

代码语言:javascript复制GET _analyze?pretty

{

"analyzer": "ik_smart",

"text":"王者荣耀"

}分词结果

代码语言:javascript复制{

"tokens": [

{

"token": "王者荣耀",

"start_offset": 0,

"end_offset": 4,

"type": "CN_WORD",

"position": 0

}

]

}

相关手记

365彩票最新版app下载 www.youjizz.com 的翻译是

www.youjizz.com 的翻译是

07-26 👁️ 7054
365彩票最新版app下载 美国入籍的好处有哪些?为什么这么多人想入籍美国?
beat365英超欧冠比分 今天入手 axial yeti xl ,顺便简评一下(新增视频)
365彩票最新版app下载 英雄联盟下水道英雄什么意思,无非就是游戏版本更新
365bet取款要多久 界吕布 - 三国杀OLWIKI

界吕布 - 三国杀OLWIKI

01-16 👁️ 7900
beat365英超欧冠比分 仁济科普

仁济科普

07-23 👁️ 7559