はじめに
日頃は,土木分野をターゲットとしたAIの応用研究をしているのですが,隣の課から
「数千万のテキストファイルがあるのだけど,ファイル数が多すぎて検索スピードが遅いのようね.なんとかならない?」との相談があった.
昔,研究でHadoopやHDFSを使った経験あるけどGISを対象としていたし,そもそもHadoopってバッチ処理じゃん!という認識が強いので,全文検索に特化した良いものないかと調査していたらElasticsearchに出会った(今更感イッパイ).
試しに触ってみるために,まずはElasticsearchの構築をしてみようと思う.
ただ,docker-composeの記載方法を完全に忘れてしまったため,構築方法は分割して投稿する...(オッサンになると記憶力が...)
ということで,今回はElasticsearchのDockerイメージ作成まで.
参考にさせて頂いたサイト
この構築では,以下のサイトを参考させていただきました.とても感謝.
初めてのElasticsearch with Docker
ElasticsearchのGetting started with Elasticsearch
構築環境
CPU | Intel(R) Core(TM) i7-9700K CPU @ 3.60GHz |
メモリ容量 | 32GB |
OS | Ubuntu 18.04.4 LTS (Bionic Beaver) |
Docker | Docker version 20.10.5, build 55c4c88 |
docker-compose | version 1.16.1, build 6d1ac21 |
Elasticsearchの構築開始
ElasticsearchのDockerイメージ作成
まずは,なによりDockerfileの作成ですね.
DockerHubのelasticsearchを見てみると,現在の最新バージョンは7.13.2なので,Dockerfileの1行目に7.13.2のdockerイメージをダウンロードするように記載.
日本人なので,日本語対応するためにkuromojiのプラグインもインストール(2行目).
さらに,最近の用語も対応したいので,Neologdのプラグインもインストール(3行目).
FROM docker.elastic.co/elasticsearch/elasticsearch:7.13.2
RUN elasticsearch-plugin install analysis-kuromoji
RUN elasticsearch-plugin install org.codelibs:elasticsearch-analysis-kuromoji-ipadic-neologd:7.1.0
(追記ここから)
以前,↑では「RUN elasticsearch-plugin install org.codelibs:elasticsearch-analysis-kuromoji-ipadic-neologd:7.1.0」を記載していたが,次の記事に書くdocker-compose upでJavaのエラーが出現したため削除.エラー内容は記事の最後に追加.
(追記ここまで)
Dockerfileもできたので,docker buildを実行.
$ docker build -f ./Dockerfile .
Sending build context to Docker daemon 4.096kB
Step 1/3 : FROM docker.elastic.co/elasticsearch/elasticsearch:7.13.2
7.13.2: Pulling from elasticsearch/elasticsearch
ddf49b9115d7: Pull complete
815a15889ec1: Pull complete
ba5d33fc5cc5: Pull complete
976d4f887b1a: Pull complete
9b5ee4563932: Pull complete
ef11e8f17d0c: Pull complete
3c5ad4db1e24: Pull complete
Digest: sha256:1cecc2c7419a4f917a88c83180335bd491d623f28ac43ca7e0e69b4eca25fbd5
Status: Downloaded newer image for docker.elastic.co/elasticsearch/elasticsearch:7.13.2
---> 11a830014f7c
Step 2/3 : RUN elasticsearch-plugin install analysis-kuromoji
---> Running in 0cdc1f7f3102
-> Installing analysis-kuromoji
-> Downloading analysis-kuromoji from elastic
[=================================================] 100%??
-> Installed analysis-kuromoji
-> Please restart Elasticsearch to activate any plugins installed
Removing intermediate container 0cdc1f7f3102
---> 144040a82003
Step 3/3 : RUN elasticsearch-plugin install org.codelibs:elasticsearch-analysis-kuromoji-ipadic-neologd:7.1.0
---> Running in 71997e0aca6e
-> Installing org.codelibs:elasticsearch-analysis-kuromoji-ipadic-neologd:7.1.0
-> Downloading org.codelibs:elasticsearch-analysis-kuromoji-ipadic-neologd:7.1.0 from maven central
[=================================================] 100%??
Warning: sha512 not found, falling back to sha1. This behavior is deprecated and will be removed in a future release. Please update the plugin to use a sha512 checksum.
-> Installed analysis-kuromoji-ipadic-neologd
-> Please restart Elasticsearch to activate any plugins installed
Removing intermediate container 71997e0aca6e
---> f53988aa2593
Successfully built f53988aa2593
ログを見る限り問題なさそう.でも,docker imagesで確認.
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
docker.elastic.co/elasticsearch/elasticsearch 7.13.2 11a830014f7c 3 weeks ago 1.02GB
次は,docker-compose.ymlを書いて起動まで行こうと思う.
(追記ここから)
追記で書いたJavaのエラー内容.
es01 | "stacktrace": ["java.lang.NoSuchMethodError: 'void org.elasticsearch.index.analysis.AbstractTokenizerFactory.<init>(org.elasticsearch.index.IndexSettings, org.elasticsearch.common.settings.Settings)'",
es01 | "at org.codelibs.elasticsearch.kuromoji.ipadic.neologd.index.analysis.KuromojiTokenizerFactory.<init>(KuromojiTokenizerFactory.java:50) ~[?:?]",
es01 | "at org.elasticsearch.index.analysis.AnalysisRegistry.buildMapping(AnalysisRegistry.java:433) ~[elasticsearch-7.13.2.jar:7.13.2]",
es01 | "at org.elasticsearch.index.analysis.AnalysisRegistry.buildTokenizerFactories(AnalysisRegistry.java:275) ~[elasticsearch-7.13.2.jar:7.13.2]",
es01 | "at org.elasticsearch.index.analysis.AnalysisRegistry.build(AnalysisRegistry.java:203) ~[elasticsearch-7.13.2.jar:7.13.2]",
es01 | "at org.elasticsearch.index.IndexModule.newIndexService(IndexModule.java:431) ~[elasticsearch-7.13.2.jar:7.13.2]",
es01 | "at org.elasticsearch.indices.IndicesService.createIndexService(IndicesService.java:663) ~[elasticsearch-7.13.2.jar:7.13.2]",
es01 | "at org.elasticsearch.indices.IndicesService.createIndex(IndicesService.java:566) ~[elasticsearch-7.13.2.jar:7.13.2]",
es01 | "at org.elasticsearch.cluster.metadata.MetadataIndexTemplateService.validateTemplate(MetadataIndexTemplateService.java:1288) ~[elasticsearch-7.13.2.jar:7.13.2]",
es01 | "at org.elasticsearch.cluster.metadata.MetadataIndexTemplateService.access$300(MetadataIndexTemplateService.java:83) ~[elasticsearch-7.13.2.jar:7.13.2]",
es01 | "at org.elasticsearch.cluster.metadata.MetadataIndexTemplateService$6.execute(MetadataIndexTemplateService.java:775) ~[elasticsearch-7.13.2.jar:7.13.2]",
es01 | "at org.elasticsearch.cluster.ClusterStateUpdateTask.execute(ClusterStateUpdateTask.java:48) ~[elasticsearch-7.13.2.jar:7.13.2]",
es01 | "at org.elasticsearch.cluster.service.MasterService.executeTasks(MasterService.java:691) ~[elasticsearch-7.13.2.jar:7.13.2]",
es01 | "at org.elasticsearch.cluster.service.MasterService.calculateTaskOutputs(MasterService.java:313) ~[elasticsearch-7.13.2.jar:7.13.2]",
es01 | "at org.elasticsearch.cluster.service.MasterService.runTasks(MasterService.java:208) ~[elasticsearch-7.13.2.jar:7.13.2]",
es01 | "at org.elasticsearch.cluster.service.MasterService.access$000(MasterService.java:62) ~[elasticsearch-7.13.2.jar:7.13.2]",
es01 | "at org.elasticsearch.cluster.service.MasterService$Batcher.run(MasterService.java:140) ~[elasticsearch-7.13.2.jar:7.13.2]",
es01 | "at org.elasticsearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:139) ~[elasticsearch-7.13.2.jar:7.13.2]",
es01 | "at org.elasticsearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:177) ~[elasticsearch-7.13.2.jar:7.13.2]",
es01 | "at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:673) ~[elasticsearch-7.13.2.jar:7.13.2]",
es01 | "at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:241) ~[elasticsearch-7.13.2.jar:7.13.2]",
es01 | "at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:204) ~[elasticsearch-7.13.2.jar:7.13.2]",
es01 | "at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) ~[?:?]",
es01 | "at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) ~[?:?]",
es01 | "at java.lang.Thread.run(Thread.java:831) [?:?]"] }
es01 | fatal error in thread [elasticsearch[es01][masterService#updateTask][T#1]], exiting
es01 | java.lang.NoSuchMethodError: 'void org.elasticsearch.index.analysis.AbstractTokenizerFactory.<init>(org.elasticsearch.index.IndexSettings, org.elasticsearch.common.settings.Settings)'
es01 | at org.codelibs.elasticsearch.kuromoji.ipadic.neologd.index.analysis.KuromojiTokenizerFactory.<init>(KuromojiTokenizerFactory.java:50)
es01 | at org.elasticsearch.index.analysis.AnalysisRegistry.buildMapping(AnalysisRegistry.java:433)
es01 | at org.elasticsearch.index.analysis.AnalysisRegistry.buildTokenizerFactories(AnalysisRegistry.java:275)
es01 | at org.elasticsearch.index.analysis.AnalysisRegistry.build(AnalysisRegistry.java:203)
es01 | at org.elasticsearch.index.IndexModule.newIndexService(IndexModule.java:431)
es01 | at org.elasticsearch.indices.IndicesService.createIndexService(IndicesService.java:663)
es01 | at org.elasticsearch.indices.IndicesService.createIndex(IndicesService.java:566)
es01 | at org.elasticsearch.cluster.metadata.MetadataIndexTemplateService.validateTemplate(MetadataIndexTemplateService.java:1288)
es01 | at org.elasticsearch.cluster.metadata.MetadataIndexTemplateService.access$300(MetadataIndexTemplateService.java:83)
es01 | at org.elasticsearch.cluster.metadata.MetadataIndexTemplateService$6.execute(MetadataIndexTemplateService.java:775)
es01 | at org.elasticsearch.cluster.ClusterStateUpdateTask.execute(ClusterStateUpdateTask.java:48)
es01 | at org.elasticsearch.cluster.service.MasterService.executeTasks(MasterService.java:691)
es01 | at org.elasticsearch.cluster.service.MasterService.calculateTaskOutputs(MasterService.java:313)
es01 | at org.elasticsearch.cluster.service.MasterService.runTasks(MasterService.java:208)
es01 | at org.elasticsearch.cluster.service.MasterService.access$000(MasterService.java:62)
es01 | at org.elasticsearch.cluster.service.MasterService$Batcher.run(MasterService.java:140)
es01 | at org.elasticsearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:139)
es01 | at org.elasticsearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:177)
es01 | at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:673)
es01 | at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:241)
es01 | at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:204)
es01 | at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
es01 | at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)
es01 | at java.base/java.lang.Thread.run(Thread.java:831)
es01 exited with code 1
(追記ここまで)