Day 8

デジタル創作サークルUniProject Advent Calendar 2025

Day 9

七夕凛 Advent Calendar 2025

@ysmreg1in

デジタル創作サークルUniProject

Elasticsearch入門｜全文検索エンジンの基礎から実践的な活用法まで

Posted at 2025-12-25

はじめに

「データベースの検索が遅い」
「あいまい検索や類似検索を実装したい」
「大量のログを効率的に分析したい」

これらの課題を解決するのがElasticsearchです。Elasticsearchは、高速な全文検索とリアルタイム分析を得意とする分散検索エンジンです。

本記事では、Elasticsearchの基礎概念から実践的なクエリ、運用のポイントまでなぜそうするのかを丁寧に解説します。

Elasticsearchとは

なぜElasticsearchが必要なのか

RDBMSのLIKE検索には限界があります：

-- ❌ LIKE検索の問題
SELECT * FROM articles WHERE content LIKE '%プログラミング%';

-- 問題点：
-- 1. 遅い：インデックスが効かず、全行スキャン
-- 2. 柔軟性がない：「プログラム」で「プログラミング」がヒットしない
-- 3. 関連度がない：どの結果がより関連性が高いかわからない

Elasticsearchは転置インデックスという仕組みで、これらの問題を解決します。

転置インデックスとは

通常のデータベースは「ドキュメント → 単語」の構造ですが、Elasticsearchは「単語 → ドキュメント」の構造（転置インデックス）を作ります。

┌─────────────────────────────────────────────────────────────┐
│                    通常のインデックス                        │
│                                                              │
│  Doc1 → ["Elasticsearch", "は", "検索", "エンジン", "です"]  │
│  Doc2 → ["検索", "機能", "を", "実装", "する"]               │
│  Doc3 → ["Elasticsearch", "で", "ログ", "分析"]             │
│                                                              │
│  「検索」を含むドキュメントを探すには全ドキュメントを走査    │
│                                                              │
└─────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────┐
│                    転置インデックス                          │
│                                                              │
│  "Elasticsearch" → [Doc1, Doc3]                             │
│  "検索"          → [Doc1, Doc2]                             │
│  "エンジン"      → [Doc1]                                   │
│  "ログ"          → [Doc3]                                   │
│  "分析"          → [Doc3]                                   │
│                                                              │
│  「検索」を含むドキュメント → [Doc1, Doc2] 即座に取得！      │
│                                                              │
└─────────────────────────────────────────────────────────────┘

主な用途

用途	説明	例
全文検索	テキストデータの高速検索	ECサイトの商品検索、ブログ検索
ログ分析	大量のログをリアルタイム分析	アクセスログ、エラーログ
メトリクス	時系列データの集計・可視化	サーバー監視、パフォーマンス分析
セキュリティ分析	イベントデータの相関分析	SIEM、脅威検出

RDBMSとの比較

特徴	RDBMS	Elasticsearch
データ構造	テーブル（行と列）	JSONドキュメント
検索	SQLクエリ	DSL（ドメイン固有言語）
全文検索	苦手（LIKE検索）	得意（転置インデックス）
集計	GROUP BY	Aggregations
トランザクション	ACID対応	基本的に非対応
スケーラビリティ	垂直スケール中心	水平スケール（シャーディング）

使い分け

ユースケース	推奨
トランザクションが必要	RDBMS
全文検索が必要	Elasticsearch
複雑なJOINが多い	RDBMS
リアルタイム分析	Elasticsearch
厳密なデータ整合性	RDBMS

多くの場合、RDBMSをプライマリデータベースとして、Elasticsearchを検索用に併用します。

基本概念

用語の理解

┌─────────────────────────────────────────────────────────────┐
│                    Elasticsearchの構造                       │
│                                                              │
│  ┌─────────────────────────────────────────────────────┐    │
│  │ Cluster（クラスター）                                │    │
│  │ 複数のノードの集合                                   │    │
│  │                                                      │    │
│  │  ┌────────────┐ ┌────────────┐ ┌────────────┐       │    │
│  │  │  Node 1    │ │  Node 2    │ │  Node 3    │       │    │
│  │  │            │ │            │ │            │       │    │
│  │  │ ┌────────┐ │ │ ┌────────┐ │ │ ┌────────┐ │       │    │
│  │  │ │Shard 0 │ │ │ │Shard 1 │ │ │ │Shard 2 │ │       │    │
│  │  │ └────────┘ │ │ └────────┘ │ │ └────────┘ │       │    │
│  │  │ ┌────────┐ │ │ ┌────────┐ │ │ ┌────────┐ │       │    │
│  │  │ │Replica │ │ │ │Replica │ │ │ │Replica │ │       │    │
│  │  │ └────────┘ │ │ └────────┘ │ │ └────────┘ │       │    │
│  │  └────────────┘ └────────────┘ └────────────┘       │    │
│  └─────────────────────────────────────────────────────┘    │
│                                                              │
└─────────────────────────────────────────────────────────────┘

用語	RDBMSでの対応	説明
Cluster	-	複数のノードの集合
Node	サーバー	1つのElasticsearchインスタンス
Index	Database	ドキュメントの論理的なグループ
Shard	-	インデックスの物理的な分割単位
Replica	レプリカ	シャードのコピー（冗長性）
Document	Row	1つのJSONドキュメント
Field	Column	ドキュメント内の属性
Mapping	Schema	フィールドのデータ型定義

シャーディングとレプリカ

なぜシャーディングが必要か？

1つのインデックスに10億件のドキュメントがあると、1台のサーバーでは処理しきれません。シャーディングは、インデックスを複数の断片（シャード）に分割し、複数のノードに分散させます。

インデックス「products」（100万件）
    ↓
┌─────────┐ ┌─────────┐ ┌─────────┐
│Shard 0  │ │Shard 1  │ │Shard 2  │
│33万件   │ │33万件   │ │34万件   │
│(Node 1) │ │(Node 2) │ │(Node 3) │
└─────────┘ └─────────┘ └─────────┘

検索クエリ → 3つのシャードで並列処理 → 結果をマージ

なぜレプリカが必要か？

レプリカは、シャードのコピーです。2つの目的があります：

高可用性: プライマリシャードが死んでも、レプリカから復旧
検索性能向上: レプリカも検索に使える（負荷分散）

┌─────────────────────────────────────────────────────────────┐
│                    レプリカの配置                            │
│                                                              │
│  Node 1            Node 2            Node 3                  │
│  ┌───────────┐    ┌───────────┐    ┌───────────┐            │
│  │ Shard 0 P │    │ Shard 1 P │    │ Shard 2 P │            │
│  │ Shard 1 R │    │ Shard 2 R │    │ Shard 0 R │            │
│  └───────────┘    └───────────┘    └───────────┘            │
│                                                              │
│  P = Primary（プライマリ）  R = Replica（レプリカ）          │
│  同じシャードのP/Rは別ノードに配置される                     │
│                                                              │
└─────────────────────────────────────────────────────────────┘

環境構築

インストール

# Docker（最も簡単）
docker run -d --name elasticsearch \
    -p 9200:9200 -p 9300:9300 \
    -e "discovery.type=single-node" \
    -e "xpack.security.enabled=false" \
    elasticsearch:8.11.0

# Docker Compose（Kibana付き）
# docker-compose.yml
version: '3.8'
services:
  elasticsearch:
    image: elasticsearch:8.11.0
    environment:
      - discovery.type=single-node
      - xpack.security.enabled=false
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
    ports:
      - "9200:9200"
    volumes:
      - es_data:/usr/share/elasticsearch/data

  kibana:
    image: kibana:8.11.0
    ports:
      - "5601:5601"
    environment:
      - ELASTICSEARCH_HOSTS=http://elasticsearch:9200

volumes:
  es_data:

動作確認

# クラスター情報を取得
curl http://localhost:9200

# レスポンス例
{
  "name" : "node-1",
  "cluster_name" : "docker-cluster",
  "cluster_uuid" : "xxx",
  "version" : {
    "number" : "8.11.0"
  },
  "tagline" : "You Know, for Search"
}

# クラスターのヘルス確認
curl http://localhost:9200/_cluster/health?pretty

# インデックス一覧
curl http://localhost:9200/_cat/indices?v

ドキュメント操作

ドキュメントの作成

# ドキュメントを作成（ID自動生成）
curl -X POST "http://localhost:9200/products/_doc" \
    -H "Content-Type: application/json" \
    -d '{
        "name": "MacBook Pro",
        "price": 248000,
        "category": "ノートパソコン",
        "description": "Apple製の高性能ノートパソコン。M3チップ搭載。",
        "tags": ["Apple", "Mac", "ノートPC"],
        "stock": 50,
        "created_at": "2024-01-01T00:00:00"
    }'

# ドキュメントを作成（ID指定）
curl -X PUT "http://localhost:9200/products/_doc/1" \
    -H "Content-Type: application/json" \
    -d '{
        "name": "iPhone 15",
        "price": 124800,
        "category": "スマートフォン",
        "description": "最新のiPhone。A17チップ搭載。"
    }'

ドキュメントの取得

# IDで取得
curl "http://localhost:9200/products/_doc/1"

# レスポンス
{
  "_index": "products",
  "_id": "1",
  "_source": {
    "name": "iPhone 15",
    "price": 124800,
    "category": "スマートフォン",
    "description": "最新のiPhone。A17チップ搭載。"
  }
}

ドキュメントの更新

# 部分更新
curl -X POST "http://localhost:9200/products/_update/1" \
    -H "Content-Type: application/json" \
    -d '{
        "doc": {
            "price": 119800,
            "stock": 100
        }
    }'

# スクリプトで更新（在庫を減らす）
curl -X POST "http://localhost:9200/products/_update/1" \
    -H "Content-Type: application/json" \
    -d '{
        "script": {
            "source": "ctx._source.stock -= params.count",
            "params": {
                "count": 1
            }
        }
    }'

ドキュメントの削除

# IDで削除
curl -X DELETE "http://localhost:9200/products/_doc/1"

# クエリで削除
curl -X POST "http://localhost:9200/products/_delete_by_query" \
    -H "Content-Type: application/json" \
    -d '{
        "query": {
            "range": {
                "stock": { "lte": 0 }
            }
        }
    }'

バルク操作

大量のドキュメントを効率的に操作するには_bulk APIを使います。

curl -X POST "http://localhost:9200/_bulk" \
    -H "Content-Type: application/json" \
    -d '
{"index": {"_index": "products", "_id": "2"}}
{"name": "iPad Pro", "price": 124800, "category": "タブレット"}
{"index": {"_index": "products", "_id": "3"}}
{"name": "AirPods Pro", "price": 39800, "category": "イヤホン"}
{"delete": {"_index": "products", "_id": "100"}}
{"update": {"_index": "products", "_id": "1"}}
{"doc": {"stock": 50}}
'

検索クエリ

基本的な検索

# 全件取得
curl "http://localhost:9200/products/_search"

# match検索（全文検索）
curl -X POST "http://localhost:9200/products/_search" \
    -H "Content-Type: application/json" \
    -d '{
        "query": {
            "match": {
                "description": "高性能 ノートパソコン"
            }
        }
    }'

match検索の動作

matchクエリは、検索文字列をトークン（単語）に分割し、それぞれのトークンでOR検索します。

検索文字列: "高性能 ノートパソコン"
    ↓ トークナイズ
["高性能", "ノートパソコン"]
    ↓
「高性能」OR「ノートパソコン」を含むドキュメントを検索

クエリの種類

match（全文検索）

{
    "query": {
        "match": {
            "description": {
                "query": "高性能 ノートパソコン",
                "operator": "and"  // デフォルトはor
            }
        }
    }
}

match_phrase（フレーズ検索）

単語の順序を考慮した検索。

{
    "query": {
        "match_phrase": {
            "description": "Apple製の高性能"
        }
    }
}

multi_match（複数フィールド検索）

{
    "query": {
        "multi_match": {
            "query": "Apple",
            "fields": ["name", "description", "tags"]
        }
    }
}

term（完全一致）

termは分析（トークナイズ）されずに検索されます。keywordフィールドに使用。

{
    "query": {
        "term": {
            "category.keyword": "スマートフォン"
        }
    }
}

range（範囲検索）

{
    "query": {
        "range": {
            "price": {
                "gte": 50000,
                "lte": 150000
            }
        }
    }
}

bool（複合クエリ）

複数の条件を組み合わせます。

{
    "query": {
        "bool": {
            "must": [
                // AND条件（スコアに影響）
                { "match": { "description": "Apple" } }
            ],
            "filter": [
                // AND条件（スコアに影響しない、キャッシュ効率が良い）
                { "range": { "price": { "lte": 150000 } } },
                { "term": { "category.keyword": "スマートフォン" } }
            ],
            "should": [
                // OR条件（あれば優先される）
                { "match": { "tags": "人気" } }
            ],
            "must_not": [
                // NOT条件
                { "term": { "stock": 0 } }
            ]
        }
    }
}

must vs filter

どちらもAND条件ですが、重要な違いがあります：

	must	filter
スコア計算	する	しない
キャッシュ	されない	される
用途	関連度に影響させたい条件	フィルタリング条件

ハイライト

検索結果でマッチした箇所をハイライト表示。

{
    "query": {
        "match": { "description": "Apple" }
    },
    "highlight": {
        "fields": {
            "description": {}
        },
        "pre_tags": ["<em>"],
        "post_tags": ["</em>"]
    }
}

// レスポンス
{
    "hits": {
        "hits": [{
            "_source": { ... },
            "highlight": {
                "description": ["<em>Apple</em>製の高性能ノートパソコン"]
            }
        }]
    }
}

ソートとページネーション

{
    "query": { "match_all": {} },
    "sort": [
        { "price": "desc" },
        { "created_at": "asc" }
    ],
    "from": 0,    // オフセット（ページ番号 × サイズ）
    "size": 10    // 取得件数
}

深いページネーションの問題

fromが大きくなると性能が悪化します（例：from: 10000）。
これは、各シャードがfrom + size件を取得し、コーディネーターノードでマージするためです。

解決策: search_after

// 最初のリクエスト
{
    "query": { "match_all": {} },
    "sort": [{ "created_at": "desc" }, { "_id": "asc" }],
    "size": 10
}

// 次のページ（前回の最後のドキュメントのソート値を指定）
{
    "query": { "match_all": {} },
    "sort": [{ "created_at": "desc" }, { "_id": "asc" }],
    "size": 10,
    "search_after": ["2024-01-01T00:00:00", "abc123"]
}

Aggregations（集計）

Elasticsearchの強力な集計機能。SQLのGROUP BYに相当しますが、より柔軟です。

Bucket Aggregations（バケット集計）

データをグループ化します。

{
    "size": 0,  // ドキュメントは不要、集計結果だけ欲しい
    "aggs": {
        "categories": {
            "terms": {
                "field": "category.keyword",
                "size": 10
            }
        }
    }
}

// レスポンス
{
    "aggregations": {
        "categories": {
            "buckets": [
                { "key": "スマートフォン", "doc_count": 50 },
                { "key": "ノートパソコン", "doc_count": 30 },
                { "key": "タブレット", "doc_count": 20 }
            ]
        }
    }
}

Metric Aggregations（メトリック集計）

数値の統計を計算します。

{
    "size": 0,
    "aggs": {
        "price_stats": {
            "stats": {
                "field": "price"
            }
        },
        "avg_price": {
            "avg": {
                "field": "price"
            }
        },
        "max_price": {
            "max": {
                "field": "price"
            }
        }
    }
}

// レスポンス
{
    "aggregations": {
        "price_stats": {
            "count": 100,
            "min": 9800,
            "max": 498000,
            "avg": 89500,
            "sum": 8950000
        }
    }
}

ネストした集計

バケット内でさらに集計できます。

{
    "size": 0,
    "aggs": {
        "categories": {
            "terms": {
                "field": "category.keyword"
            },
            "aggs": {
                "avg_price": {
                    "avg": { "field": "price" }
                },
                "price_ranges": {
                    "range": {
                        "field": "price",
                        "ranges": [
                            { "to": 50000 },
                            { "from": 50000, "to": 100000 },
                            { "from": 100000 }
                        ]
                    }
                }
            }
        }
    }
}

日付ヒストグラム

時系列データの集計に便利。

{
    "size": 0,
    "aggs": {
        "orders_over_time": {
            "date_histogram": {
                "field": "created_at",
                "calendar_interval": "month"
            },
            "aggs": {
                "total_sales": {
                    "sum": { "field": "price" }
                }
            }
        }
    }
}

マッピング

マッピングとは

マッピングは、ドキュメントの構造（スキーマ）を定義します。RDBMSのテーブル定義に相当します。

// マッピングの確認
// GET /products/_mapping

{
    "products": {
        "mappings": {
            "properties": {
                "name": { "type": "text" },
                "price": { "type": "integer" },
                "category": { 
                    "type": "text",
                    "fields": {
                        "keyword": { "type": "keyword" }
                    }
                }
            }
        }
    }
}

主なデータ型

型	説明	用途
text	全文検索用（トークナイズされる）	本文、説明文
keyword	完全一致用（トークナイズされない）	カテゴリ、タグ、ID
integer/long	整数	価格、数量
float/double	浮動小数点	評価、座標
boolean	真偽値	フラグ
date	日付	作成日、更新日
object	ネストしたJSON	住所、メタデータ
nested	配列内のオブジェクト	コメント、レビュー

マッピングの定義

# インデックス作成時にマッピングを定義
curl -X PUT "http://localhost:9200/products" \
    -H "Content-Type: application/json" \
    -d '{
        "settings": {
            "number_of_shards": 3,
            "number_of_replicas": 1
        },
        "mappings": {
            "properties": {
                "name": {
                    "type": "text",
                    "analyzer": "kuromoji"  // 日本語形態素解析
                },
                "price": {
                    "type": "integer"
                },
                "category": {
                    "type": "keyword"
                },
                "description": {
                    "type": "text",
                    "analyzer": "kuromoji"
                },
                "tags": {
                    "type": "keyword"
                },
                "created_at": {
                    "type": "date",
                    "format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
                },
                "location": {
                    "type": "geo_point"
                }
            }
        }
    }'

text vs keyword

	text	keyword
用途	全文検索	完全一致、ソート、集計
分析	される（トークナイズ）	されない
例	商品説明、記事本文	カテゴリ、タグ、メールアドレス

// 両方使いたい場合
{
    "category": {
        "type": "text",
        "fields": {
            "keyword": {
                "type": "keyword"
            }
        }
    }
}

// 全文検索: category
// 完全一致/集計: category.keyword

日本語検索

日本語の課題

英語は単語がスペースで区切られていますが、日本語は区切りがありません。

英語: "I love programming"
    → ["I", "love", "programming"] （簡単に分割）

日本語: "私はプログラミングが好きです"
    → どこで区切る？ （難しい）

Kuromojiアナライザー

Elasticsearchには日本語形態素解析プラグイン「Kuromoji」があります。

# Kuromojiプラグインのインストール
docker exec -it elasticsearch bin/elasticsearch-plugin install analysis-kuromoji
docker restart elasticsearch

// Kuromojiを使ったインデックス設定
{
    "settings": {
        "analysis": {
            "analyzer": {
                "my_kuromoji": {
                    "type": "custom",
                    "tokenizer": "kuromoji_tokenizer",
                    "filter": [
                        "kuromoji_baseform",    // 基本形に正規化
                        "kuromoji_part_of_speech", // 品詞フィルタ
                        "cjk_width",            // 全角半角正規化
                        "lowercase"
                    ]
                }
            }
        }
    },
    "mappings": {
        "properties": {
            "content": {
                "type": "text",
                "analyzer": "my_kuromoji"
            }
        }
    }
}

# トークナイズの確認
curl -X POST "http://localhost:9200/products/_analyze" \
    -H "Content-Type: application/json" \
    -d '{
        "analyzer": "kuromoji",
        "text": "私はプログラミングが好きです"
    }'

# レスポンス
{
    "tokens": [
        { "token": "私", ... },
        { "token": "は", ... },
        { "token": "プログラミング", ... },
        { "token": "が", ... },
        { "token": "好き", ... },
        { "token": "です", ... }
    ]
}

実践的なユースケース

ECサイトの商品検索

// 商品検索クエリ
{
    "query": {
        "bool": {
            "must": [
                {
                    "multi_match": {
                        "query": "ワイヤレスイヤホン ノイズキャンセリング",
                        "fields": ["name^3", "description", "tags^2"],
                        "type": "best_fields"
                    }
                }
            ],
            "filter": [
                { "range": { "price": { "gte": 10000, "lte": 50000 } } },
                { "term": { "in_stock": true } }
            ]
        }
    },
    "aggs": {
        "categories": {
            "terms": { "field": "category.keyword" }
        },
        "brands": {
            "terms": { "field": "brand.keyword" }
        },
        "price_ranges": {
            "range": {
                "field": "price",
                "ranges": [
                    { "key": "〜1万円", "to": 10000 },
                    { "key": "1〜3万円", "from": 10000, "to": 30000 },
                    { "key": "3万円〜", "from": 30000 }
                ]
            }
        }
    },
    "highlight": {
        "fields": { "name": {}, "description": {} }
    },
    "sort": [
        { "_score": "desc" },
        { "created_at": "desc" }
    ],
    "size": 20
}

ログ分析（ELKスタック）

// エラーログの集計
{
    "query": {
        "bool": {
            "filter": [
                { "term": { "level": "ERROR" } },
                { "range": { "@timestamp": { "gte": "now-24h" } } }
            ]
        }
    },
    "aggs": {
        "errors_over_time": {
            "date_histogram": {
                "field": "@timestamp",
                "calendar_interval": "hour"
            }
        },
        "top_errors": {
            "terms": {
                "field": "message.keyword",
                "size": 10
            }
        },
        "by_service": {
            "terms": {
                "field": "service.keyword"
            },
            "aggs": {
                "error_count": {
                    "value_count": { "field": "_id" }
                }
            }
        }
    }
}

運用のポイント

インデックス設計のベストプラクティス

時系列データは日付でインデックスを分割
```
logs-2024.01
logs-2024.02
logs-2024.03
```
古いインデックスを簡単に削除でき、検索も効率的
シャード数の決定
- 1シャードあたり10〜50GB程度が目安
- ノード数の倍数にすると負荷が均等に
レプリカ数の決定
- 最低1（可用性のため）
- 検索負荷が高い場合は増やす

パフォーマンスチューニング

// 検索性能向上
{
    "settings": {
        "index": {
            "refresh_interval": "30s",  // デフォルト1s、書き込み多いなら延ばす
            "number_of_replicas": 2      // 検索負荷に応じて
        }
    }
}

監視すべきメトリクス

メトリクス	注意点
Cluster Health	Yellow/Redは要対応
JVM Heap	75%超えは危険
CPU使用率	持続的な高負荷は問題
ディスク使用率	85%超えは危険
Search Latency	平均と外れ値を監視
Indexing Rate	急激な変化は異常の兆候

まとめ

Elasticsearchは強力な検索・分析エンジンです。この記事で学んだ内容を整理すると：

基本概念: 転置インデックス、シャーディング、レプリカ
CRUD操作: ドキュメントの作成・取得・更新・削除
検索クエリ: match、bool、range、term
Aggregations: バケット集計、メトリック集計
マッピング: text vs keyword、日本語対応
運用: インデックス設計、パフォーマンスチューニング

RDBMSの代替ではなく、検索と分析に特化した補完ツールとして活用しましょう。まずは単純なユースケースから始めて、徐々に高度な機能を使いこなしていってください。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up