第18回Lucene/Solr勉強会 #SolrJP @Yahoo! JAPAN BASE6 発表資料 デモ構築手順
第18回Lucene/Solr勉強会で、デモンストレーションに使った環境構築手順を記載します。
設定ファイルなどは、各環境に合わせて修正が必要になるかもしれませんが、参考にしていただければと思います。
ZooKeeper
Solr の Parallel SQL は SolrCloud 環境でのみ動作します。SolrCloud モードで Solr を起動するために ZooKeeper をインストールします。
# Install ZooKeeper.
$ mkdir -p ${HOME}/zookeeper
$ curl -L -o ${HOME}/zookeeper/zookeeper-3.4.6.tar.gz http://archive.apache.org/dist/zookeeper/zookeeper-3.4.6/zookeeper-3.4.6.tar.gz
$ tar -C ${HOME}/zookeeper -xf ${HOME}/zookeeper/zookeeper-3.4.6.tar.gz
# Download configuration files from GitHub.
$ curl -L -o ${HOME}/zookeeper/zookeeper-3.4.6/conf/zoo.cfg https://raw.githubusercontent.com/mosuka/the-18th-lucene-solr-meetup/master/zookeeper/conf/zoo.cfg
$ curl -L -o ${HOME}/zookeeper/zookeeper-3.4.6/conf/zookeeper-env.sh https://raw.githubusercontent.com/mosuka/the-18th-lucene-solr-meetup/master/zookeeper/conf/zookeeper-env.sh
# Start ZooKeeper.
$ ${HOME}/zookeeper/zookeeper-3.4.6/bin/zkServer.sh start
Solr
インストールした ZooKeeper を参照する、SolrCloud を起動します。
# Install Solr.
$ mkdir -p ${HOME}/solr
$ curl -L -o ${HOME}/solr/solr-6.1.0.tar.gz https://archive.apache.org/dist/lucene/solr/6.1.0/solr-6.1.0.tgz
$ tar -C ${HOME}/solr -xf ${HOME}/solr/solr-6.1.0.tar.gz
# Download configuration file for Enabling CORS from GitHub.
$ curl -L -o ${HOME}/solr/solr-6.1.0/server/etc/webdefault.xml https://raw.githubusercontent.com/mosuka/the-18th-lucene-solr-meetup/master/solr/server/etc/webdefault.xml
# Create a znode to ZooKeeper for SolrCloud.
$ ${HOME}/solr/solr-6.1.0/server/scripts/cloud-scripts/zkcli.sh -zkhost localhost:2181 -cmd makepath /solr
# Start Solr in SolrCloud mode.
$ ${HOME}/solr/solr-6.1.0/bin/solr start -h localhost -p 8983 -d ${HOME}/solr/solr-6.1.0/server -z localhost:2181/solr -m 1g -s ${HOME}/solr/solr-6.1.0/server/solr -a "-Dsolr.autoCommit.maxTime=30 -Dsolr.autoSoftCommit.maxTime=10"
# Create configsets for realtime_data_driven_schema_configs.
$ cp -pr ${HOME}/solr/solr-6.1.0/server/solr/configsets/data_driven_schema_configs ${HOME}/solr/solr-6.1.0/server/solr/configsets/realtime_data_driven_schema_configs
$ curl -L -o ${HOME}/solr/solr-6.1.0/server/solr/configsets/realtime_data_driven_schema_configs/conf/solrconfig.xml https://raw.githubusercontent.com/mosuka/the-18th-lucene-solr-meetup/master/solr/server/solr/configsets/realtime_data_driven_schema_configs/conf/solrconfig.xml
# Upload configsets for access_log.
$ ${HOME}/solr/solr-6.1.0/server/scripts/cloud-scripts/zkcli.sh -zkhost localhost:2181/solr -cmd upconfig -confdir ${HOME}/solr/solr-6.1.0/server/solr/configsets/realtime_data_driven_schema_configs/conf -confname access_log_configs
# Create collection for access_log.
$ curl -s "http://localhost:8983/solr/admin/collections?action=CREATE&name=access_log&numShards=1&replicationFactor=1&maxShardsPerNode=1&createNodeSet=localhost:8983_solr&collection.configName=access_log_configs" | xmllint --format -
# Add require fields for access_log.
$ curl -L -o /tmp/access_log.json https://raw.githubusercontent.com/mosuka/the-18th-lucene-solr-meetup/master/solr/access_log.json
$ curl -X POST -H "Content-type:application/json" "http://localhost:8983/solr/access_log/schema" -d @/tmp/access_log.json
Flume
アクセスログなどのデータをストリーミングで、Solr へ転送するための Flume をインストールします。
Flume 1.6.0 では Solr 6.x に未対応のため、Solr 6.x 対応した、GitHub からソースコードを取得し、パッケージを作成してインストールを行います。
# Build Flume.
$ mkdir -p ${HOME}/git
$ git clone https://github.com/mosuka/flume.git ${HOME}/git/flume
$ mvn clean compile -DskipTests -f ${HOME}/git/flume/pom.xml
$ mvn clean install -DskipTests -f ${HOME}/git/flume/pom.xml
# Install Flume.
$ mkdir -p ${HOME}/flume
$ cp -r ${HOME}/git/flume/flume-ng-dist/target/apache-flume-1.7.0-SNAPSHOT-bin.tar.gz ${HOME}/flume/.
$ tar -C ${HOME}/flume -xf ${HOME}/flume/apache-flume-1.7.0-SNAPSHOT-bin.tar.gz
# Download configuration files from GitHub.
$ curl -L -o ${HOME}/flume/apache-flume-1.7.0-SNAPSHOT-bin/conf/flume-env.sh https://raw.githubusercontent.com/mosuka/the-18th-lucene-solr-meetup/master/flume/conf/flume-env.sh
$ curl -L -o ${HOME}/flume/apache-flume-1.7.0-SNAPSHOT-bin/conf/flume-conf.properties https://raw.githubusercontent.com/mosuka/the-18th-lucene-solr-meetup/master/flume/conf/flume-conf.properties
$ curl -L -o ${HOME}/flume/apache-flume-1.7.0-SNAPSHOT-bin/conf/morphline.conf https://raw.githubusercontent.com/mosuka/the-18th-lucene-solr-meetup/master/flume/conf/morphline.conf
# Download grok dictionaries from GitHub.
$ mkdir -p ${HOME}/flume/apache-flume-1.7.0-SNAPSHOT-bin/resources/grok-dictionaries
$ curl -L -o ${HOME}/flume/apache-flume-1.7.0-SNAPSHOT-bin/resources/grok-dictionaries/firewalls https://raw.githubusercontent.com/kite-sdk/kite/master/kite-morphlines/kite-morphlines-core/src/test/resources/grok-dictionaries/firewalls
$ curl -L -o ${HOME}/flume/apache-flume-1.7.0-SNAPSHOT-bin/resources/grok-dictionaries/grok-patterns https://raw.githubusercontent.com/kite-sdk/kite/master/kite-morphlines/kite-morphlines-core/src/test/resources/grok-dictionaries/grok-patterns
$ curl -L -o ${HOME}/flume/apache-flume-1.7.0-SNAPSHOT-bin/resources/grok-dictionaries/java https://raw.githubusercontent.com/kite-sdk/kite/master/kite-morphlines/kite-morphlines-core/src/test/resources/grok-dictionaries/java
$ curl -L -o ${HOME}/flume/apache-flume-1.7.0-SNAPSHOT-bin/resources/grok-dictionaries/linux-syslog https://raw.githubusercontent.com/kite-sdk/kite/master/kite-morphlines/kite-morphlines-core/src/test/resources/grok-dictionaries/linux-syslog
$ curl -L -o ${HOME}/flume/apache-flume-1.7.0-SNAPSHOT-bin/resources/grok-dictionaries/mcollective https://raw.githubusercontent.com/kite-sdk/kite/master/kite-morphlines/kite-morphlines-core/src/test/resources/grok-dictionaries/mcollective
$ curl -L -o ${HOME}/flume/apache-flume-1.7.0-SNAPSHOT-bin/resources/grok-dictionaries/mcollective-patterns https://raw.githubusercontent.com/kite-sdk/kite/master/kite-morphlines/kite-morphlines-core/src/test/resources/grok-dictionaries/mcollective-patterns
$ curl -L -o ${HOME}/flume/apache-flume-1.7.0-SNAPSHOT-bin/resources/grok-dictionaries/nagios https://raw.githubusercontent.com/kite-sdk/kite/master/kite-morphlines/kite-morphlines-core/src/test/resources/grok-dictionaries/nagios
$ curl -L -o ${HOME}/flume/apache-flume-1.7.0-SNAPSHOT-bin/resources/grok-dictionaries/postgresql https://raw.githubusercontent.com/kite-sdk/kite/master/kite-morphlines/kite-morphlines-core/src/test/resources/grok-dictionaries/postgresql
$ curl -L -o ${HOME}/flume/apache-flume-1.7.0-SNAPSHOT-bin/resources/grok-dictionaries/redis https://raw.githubusercontent.com/kite-sdk/kite/master/kite-morphlines/kite-morphlines-core/src/test/resources/grok-dictionaries/redis
$ curl -L -o ${HOME}/flume/apache-flume-1.7.0-SNAPSHOT-bin/resources/grok-dictionaries/ruby https://raw.githubusercontent.com/kite-sdk/kite/master/kite-morphlines/kite-morphlines-core/src/test/resources/grok-dictionaries/ruby
# Download GeoIP database from MaxMind.
$ mkdir -p ${HOME}/flume/apache-flume-1.7.0-SNAPSHOT-bin/resources/geoip
$ curl -L -o ${HOME}/flume/apache-flume-1.7.0-SNAPSHOT-bin/resources/geoip/GeoLite2-City.mmdb.gz http://geolite.maxmind.com/download/geoip/database/GeoLite2-City.mmdb.gz
$ gzip -d -c ${HOME}/flume/apache-flume-1.7.0-SNAPSHOT-bin/resources/geoip/GeoLite2-City.mmdb.gz > ${HOME}/flume/apache-flume-1.7.0-SNAPSHOT-bin/resources/geoip/GeoLite2-City.mmdb
$ rm ${HOME}/flume/apache-flume-1.7.0-SNAPSHOT-bin/resources/geoip/GeoLite2-City.mmdb.gz
# The log file to prepare in advance
$ touch /tmp/access.log
# Start Flume.
$ ${HOME}/flume/apache-flume-1.7.0-SNAPSHOT-bin/bin/flume-ng agent --conf ${HOME}/flume/apache-flume-1.7.0-SNAPSHOT-bin/conf --name agent --conf-file ${HOME}/flume/apache-flume-1.7.0-SNAPSHOT-bin/conf/flume-conf.properties -Dflume.log.dir=${HOME}/flume/apache-flume-1.7.0-SNAPSHOT-bin/logs &
$ echo $! > ${HOME}/flume/apache-flume-1.7.0-SNAPSHOT-bin/flume.pid
Zeppelin
Solr にインデックスされたデータを分析するために、Zeppelin をインストールします。
Solr に対して JDBC Driver を使用して接続を行いますが、現在リリースされている Zeppelin は JDBC Driver 未対応のため、GitHub の master をビルドして、パッケージ作成を行います。
# Install ZooKeeper.
$ mkdir -p ${HOME}/zeppelin
$ curl -L -o ${HOME}/zeppelin/zeppelin-0.6.0-bin-all.tgz https://archive.apache.org/dist/zeppelin/zeppelin-0.6.0/zeppelin-0.6.0-bin-all.tgz
$ tar -C ${HOME}/zeppelin -xf ${HOME}/zeppelin/zeppelin-0.6.0-bin-all.tgz
# Download configuration file for changing port number to 8082 from GitHub.
$ curl -L -o ${HOME}/zeppelin/zeppelin-0.6.0-bin-all/conf/zeppelin-site.xml https://raw.githubusercontent.com/mosuka/the-18th-lucene-solr-meetup/master/zeppelin/conf/zeppelin-site.xml
# Start Zeppelin.
$ ${HOME}/zeppelin/zeppelin-0.6.0-bin-all/bin/zeppelin-daemon.sh start
Zeppelin 設定
- [shared] : Interpreter for note
- [] Connect to existing process
- Properties
- solr.url = jdbc:solr://localhost:2181/solr?collection=access_log
- solr.driver = org.apache.solr.client.solrj.io.sql.DriverImpl
- Dependencies
- artifact = org.apache.solr:solr-solrj:6.1.0
Banana
Solr にインデックスされるデータをリアルタイムで可視化するために、Banana をインストールします。
Banana は基本的に Solr に組み込む形ですが、今回は、Solr とは別のノードで起動 (Multiple node) するため、Jetty を別に立てて、そちらにデプロイします。
その際、Solr 側で CORS を有効にしておく必要があります。(上記 Solr のインストール手順で行っています。)
設定情報を保存するために Solr を利用することができますが、リリースされている1.6.0では、Multiple node の環境において、リモートの Solr に設定情報を保存できないバグがあるため、その問題を修正したGitHubのソースコードからビルドして、パッケージを作成します。
# Install Jetty.
$ mkdir -p ${HOME}/jetty
$ curl -L -o ${HOME}/jetty/jetty-distribution-9.3.8.v20160314.tar.gz http://download.eclipse.org/jetty/9.3.8.v20160314/dist/jetty-distribution-9.3.8.v20160314.tar.gz
$ tar -C ${HOME}/jetty -xf ${HOME}/jetty/jetty-distribution-9.3.8.v20160314.tar.gz
# Download configuration file for changing server port to 8081 from GitHub.
$ curl -L -o ${HOME}/jetty/jetty-distribution-9.3.8.v20160314/start.ini https://raw.githubusercontent.com/mosuka/the-18th-lucene-solr-meetup/master/jetty/start.ini
# Build Banana.
$ mkdir -p ${HOME}/banana
$ curl -L -o ${HOME}/banana/banana-release.zip https://github.com/lucidworks/banana/archive/release.zip
$ unzip ${HOME}/banana/banana-release.zip -d ${HOME}/banana
$ ant -f ${HOME}/banana/banana-release/build.xml -Dfinal.name=banana
# Install Banana.
$ cp ${HOME}/banana/banana-release/build/banana.war ${HOME}/jetty/jetty-distribution-9.3.8.v20160314/webapps/.
# Upload configsets for banana-int.
$ ${HOME}/solr/solr-6.1.0/server/scripts/cloud-scripts/zkcli.sh -zkhost localhost:2181/solr -cmd upconfig -confdir ${HOME}/solr/solr-6.1.0/server/solr/configsets/data_driven_schema_configs/conf -confname banana-int_configs
# Create collection for banana-int.
$ curl -s "http://localhost:8983/solr/admin/collections?action=CREATE&name=banana-int&numShards=1&replicationFactor=1&maxShardsPerNode=1&createNodeSet=localhost:8983_solr&collection.configName=banana-int_configs" | xmllint --format -
# Add require fields for banana-int.
$ curl -L -o /tmp/banana-int.json https://raw.githubusercontent.com/mosuka/the-18th-lucene-solr-meetup/master/solr/banana-int.json
$ curl -X POST -H "Content-type:application/json" "http://localhost:8983/solr/banana-int/schema" -d @/tmp/banana-int.json
# Start Banana with Jetty.
$ ${HOME}/jetty/jetty-distribution-9.3.8.v20160314/bin/jetty.sh start
Silk
Solr にインデックスされるデータをリアルタイムで可視化するために、Silk をインストールします。
設定情報をリモートの Solr に保存できるのですが、スキーマ定義が Solr 6.0.0 から変更されたため、起動に失敗します。
この問題を修正した GitHub からソースコードを取得し、ビルドを行います。
# Build Silk.
$ mkdir -p ${HOME}/silk
$ curl -L -o ${HOME}/silk/silk-dev.zip https://github.com/mosuka/silk.git
$ unzip ${HOME}/silk/silk-dev.zip -d ${HOME}/silk
$ npm install ${HOME}/silk/silk-dev
$ bower install ${HOME}/silk/silk-dev
$ cd ${HOME}/silk/silk-dev
$ grunt build --force
$ cd ${HOME}
# Upload configsets for silkconfig.
$ ${HOME}/solr/solr-7.0.0-SNAPSHOT/server/scripts/cloud-scripts/zkcli.sh -zkhost localhost:2181/solr -cmd upconfig -confdir ${HOME}/silk/silk-dev/silkconfig/conf -confname silkconfig_configs
# Create collection for silkconfig.
$ curl -s "http://localhost:8983/solr/admin/collections?action=CREATE&name=silkconfig&numShards=1&replicationFactor=1&maxShardsPerNode=1&createNodeSet=localhost:8983_solr&collection.configName=silkconfig_configs" | xmllint --format -
# Start Silk.
$ node ${HOME}/silk/silk-dev/src/server/bin/kibana.js > ${HOME}/silk/silk-dev/silk.log &
$ echo $! > ${HOME}/silk/silk-dev/silk.pid