More than 5 years have passed since last update.

CloudSearchコマンドメモ(ドメイン作成〜データの検索まで)

CloudSearch

Last updated at 2015-04-07Posted at 2015-04-07

AWS CLIで実行したときのメモ。

$ aws --version
aws-cli/1.7.4 Python/2.7.6 Darwin/14.1.0

ドメインの作成

$ aws cloudsearch create-domain --domain-name foo-domain

インスタンスタイプとレプリケーション数の設定

デフォルト設定(インスタンスタイプ: search.m1.small, レプリケーション数: 1)だけど念のため。

$ aws cloudsearch update-scaling-parameters --domain-name foo-domain \
--scaling-parameters DesiredInstanceType=search.m1.small,DesiredReplicationCount=1

アクセスポリシーの設定

CloudSearchのアクセスがポリシーは、以下の2つの設定方法があります
- IAM
- IP制限
IP制限の場合、パブリックなIPアドレスでしか設定できないようです

Amazon EC2 インスタンスのアクセスを設定する場合は、インスタンスのパブリック IP アドレスを指定する必要があります。IP アドレスは標準の Classless Inter-Domain Routing（CIDR）形式で指定します。たとえば、10.24.34.0/24 は範囲 10.24.34.0 ～ 10.24.34.255 を指定します。一方、10.24.34.0/32 は 1 つの IP アドレス 10.24.34.0 を指定します。CIDR 表記の詳細については、「RFC 4632」を参照してください。

Amazon CloudSearch ドメイン用にアクセスを設定する

以下はIP制限をかけた例

$ aws cloudsearch update-service-access-policies --domain-name foo-domain --access-policies \
  "{\"Version\":\"2012-10-17\",
    \"Statement\":[{
      \"Sid\":\"allow_from_specified_ip\",
      \"Effect\":\"Allow\",
      \"Principal\":{\"AWS\":\"*\"},
      \"Action\":\"cloudsearch:*\",
      \"Condition\":{
        \"IpAddress\":{
          \"aws:SourceIp\":[
            \"xxx.xxx.xxx.xxx/32\",\"yyy.yyy.yyy.yyy/32\"
            ]
          }
        }
      },{
      \"Sid\":\"source_ip_restriction\",
      \"Effect\":\"Deny\",
      \"Principal\":{\"AWS\":\"*\"},
      \"Action\":\"cloudsearch:*\",
      \"Condition\":{
        \"NotIpAddress\":{
          \"aws:SourceIp\":[
            \"xxx.xxx.xxx.xxx/32\",\"yyy.yyy.yyy.yyy/32\"
            ]
          }
        }
      }
    ]
  }"

インデックスフィールドの設定

# typeがintの場合
$ aws cloudsearch define-index-field --domain-name foo-domain --name id --type int
# typeがliteralの場合
$ aws cloudsearch define-index-field --domain-name foo-domain --name fullname --type literal --default-value ""
# typeがtextの場合
$ aws cloudsearch define-index-field --domain-name foo-domain --name address --type text --default-value "" --analysis-scheme _ja_default_
# typeがdateの場合
$ aws cloudsearch define-index-field --domain-name foo-domain --name created_at --type date

Indexing

インデックスフィールドを更新したら、Indexingで更新する必要があります。

$ aws cloudsearch index-documents --domain-name foo-domain

データ投入

ダミーデータを用意

次のようなCSVファイルを用意します。
名前や住所は、疑似個人情報データ生成サービスから生成したダミーデータです。

raw.csv

"id","fullname","address","created_at"
"1","楠 俊男","岐阜県不破郡関ケ原町関ケ原1-1-1","2014-12-07T15:44:41Z"
"2","吉永 容子","沖縄県那覇市東町2-2-2","2014-07-12T00:56:57Z"
"3","秦 利子","宮崎県宮崎市吉村町3-3-3","2014-06-12T08:47:53Z"
"4","北尾 陽子","愛媛県松山市勝岡町4-4-4","2014-03-29T04:28:40Z"
"5","古田 留子","福岡県飯塚市山口5-5-5","2014-10-07T20:15:37Z"

CSVファイルをCloudSearchで処理できるJSON形式に変換します。
今回は5件だけなのであまり意味がありませんが、cs-import-documentsコマンドでJSON形式に変換することで、5MBごとに分割してファイルを作成してくれます。
5MBというのはCloudSearchで推奨されているアップロードファイルサイズです。

バッチのサイズを制限の 5 MB にできるだけ近付けてください。小さなバッチを大量にアップロードすると、アップロードとインデックス作成の処理速度が低下します。

Amazon CloudSearch ドメインにデータをアップロード

$ cs-import-documents --source ./raw.csv --output ./data

data1.json

[ {
  "type" : "add",
  "id" : "/tmp/./raw.csv_1",
  "fields" : {
    "id" : "1",
    "address" : "岐阜県不破郡関ケ原町関ケ原1-1-1",
    "created_at" : "2014-12-07T15:44:41Z",
    "fullname" : "楠 俊男"
  }
}, {
  "type" : "add",
  "id" : "/tmp/./raw.csv_2",
  "fields" : {
    "id" : "2",
    "address" : "沖縄県那覇市東町2-2-2",
    "created_at" : "2014-07-12T00:56:57Z",
    "fullname" : "吉永 容子"
  }
}, {
  "type" : "add",
  "id" : "/tmp/./raw.csv_3",
  "fields" : {
    "id" : "3",
    "address" : "宮崎県宮崎市吉村町3-3-3",
    "created_at" : "2014-06-12T08:47:53Z",
    "fullname" : "秦 利子"
  }
}, {
  "type" : "add",
  "id" : "/tmp/./raw.csv_4",
  "fields" : {
    "id" : "4",
    "address" : "愛媛県松山市勝岡町4-4-4",
    "created_at" : "2014-03-29T04:28:40Z",
    "fullname" : "北尾 陽子"
  }
}, {
  "type" : "add",
  "id" : "/tmp/./raw.csv_5",
  "fields" : {
    "id" : "5",
    "address" : "福岡県飯塚市山口5-5-5",
    "created_at" : "2014-10-07T20:15:37Z",
    "fullname" : "古田 留子"
  }
} ]

データアップロード

$ aws cloudsearchdomain upload-documents --endpoint-url https://doc-foo-domain-xxxxxxxxxxxxxxxxxxxxxxx.cloudsearch.amazonaws.com --content-type application/json --documents ./data1.json
{
    "status": "success",
    "adds": 5,
    "deletes": 0
}

データの検索

Query Parserはstructuredを指定しています。

# literal型の検索
$ aws cloudsearchdomain search --endpoint-url https://search-foo-domain-xxxxxxxxxxxxxxxxxxxxxxxxxx.cloudsearch.amazonaws.com --query-parser structured --search-query "fullname:'楠 俊男'"
{
    "status": {
        "timems": 8,
        "rid": "65mumskpJwqgfVc="
    },
    "hits": {
        "found": 1,
        "hit": [
            {
                "fields": {
                    "created_at": [
                        "2014-12-07T15:44:41Z"
                    ],
                    "fullname": [
                        "楠 俊男"
                    ],
                    "id": [
                        "1"
                    ],
                    "address": [
                        "岐阜県不破郡関ケ原町関ケ原1-1-1"
                    ]
                },
                "id": "/tmp/./raw.csv_1"
            }
        ],
        "start": 0
    }
}

# text型の検索
$ aws cloudsearchdomain search --endpoint-url https://search-foo-domain-xxxxxxxxxxxxxxxxxxxxxxxxxx.cloudsearch.amazonaws.com --query-parser structured --search-query "address:'那覇市'"
{
    "status": {
        "timems": 34,
        "rid": "kL22mskpLAqgfVc="
    },
    "hits": {
        "found": 1,
        "hit": [
            {
                "fields": {
                    "created_at": [
                        "2014-07-12T00:56:57Z"
                    ],
                    "fullname": [
                        "吉永 容子"
                    ],
                    "id": [
                        "2"
                    ],
                    "address": [
                        "沖縄県那覇市東町2-2-2"
                    ]
                },
                "id": "/tmp/./raw.csv_2"
            }
        ],
        "start": 0
    }
}

# date型の検索
# 2014/09/01以降のデータを検索
$ aws cloudsearchdomain search --endpoint-url https://search-foo-domain-xxxxxxxxxxxxxxxxxxxxxxxxxx.cloudsearch.amazonaws.com --query-parser structured --search-query "created_at:['2014-09-01T00:00:00Z',}"
{
    "status": {
        "timems": 2,
        "rid": "ssnCmskpMgqgfVc="
    },
    "hits": {
        "found": 2,
        "hit": [
            {
                "fields": {
                    "created_at": [
                        "2014-12-07T15:44:41Z"
                    ],
                    "fullname": [
                        "楠 俊男"
                    ],
                    "id": [
                        "1"
                    ],
                    "address": [
                        "岐阜県不破郡関ケ原町関ケ原1-1-1"
                    ]
                },
                "id": "/tmp/./raw.csv_1"
            },
            {
                "fields": {
                    "created_at": [
                        "2014-10-07T20:15:37Z"
                    ],
                    "fullname": [
                        "古田 留子"
                    ],
                    "id": [
                        "5"
                    ],
                    "address": [
                        "福岡県飯塚市山口5-5-5"
                    ]
                },
                "id": "/tmp/./raw.csv_5"
            }
        ],
        "start": 0
    }
}

# and検索
$ aws cloudsearchdomain search --endpoint-url https://search-foo-domain-xxxxxxxxxxxxxxxxxxxxxxxxx.cloudsearch.amazonaws.com --query-parser structured --search-query "(and fullname:'楠 俊男' address:'岐阜県')"
{
    "status": {
        "timems": 3,
        "rid": "xojfmskpQwqgfVc="
    },
    "hits": {
        "found": 1,
        "hit": [
            {
                "fields": {
                    "created_at": [
                        "2014-12-07T15:44:41Z"
                    ],
                    "fullname": [
                        "楠 俊男"
                    ],
                    "id": [
                        "1"
                    ],
                    "address": [
                        "岐阜県不破郡関ケ原町関ケ原1-1-1"
                    ]
                },
                "id": "/tmp/./raw.csv_1"
            }
        ],
        "start": 0
    }
}

参考URL

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up