More than 3 years have passed since last update.

ceph-ansible でオブジェクトストレージを作ってみる

Posted at 2020-06-12

概要

Ceph のオブジェクトストレージを試してみたくて、テスト環境を構築した記録を残しておこうと思います。

参考資料

以下のサイトなどを参考にさせていただきました。

https://docs.ceph.com/docs/octopus/
https://github.com/ceph/ceph-ansible
https://docs.ceph.com/ceph-ansible/master/
https://www.marksei.com/how-to-install-ceph-with-ceph-ansible/
https://www.server-world.info/query?os=CentOS_7&p=ceph14&f=1
https://computingforgeeks.com/install-and-configure-ceph-storage-cluster-on-centos-linux/

構成

以下画像のような構成にすることにしました。

テスト環境はさくらのクラウドで構築しています。
ノードは 3台で、ディスクは 3本で構成します。
本当はもっとディスクを積みたかったのですけど、仕様上最大 3本しか接続できませんでした。
1本は OSディスクなので、データディスクに利用するのは 2本となります。
プライベートIP のみで構成したサーバの NAT Gateway兼マネージャサーバへの踏み台アクセス(ポートフォワーディング)用に VPCルータを利用しています。
また、RGW はインターネット側から利用することになるので、グローバルIP を持たせた形にしました。
Ceph の public_network と cluster_network は別になるようにしています。

環境構築

Terraform のコードで構築しました。
参考までにこちらにおいておきましたが、実行すると上述の構成が作成できます。
OS は CentOS8 を利用することにしました。
サーバ作成時に実行されるスクリプトで、centosユーザの作成や OSアップデート、IPアドレス設定は終わらせるようにしてあります。
さくらのクラウドで CentOS利用時は、作成時に指定した SSH鍵やパスワードは root ユーザのものとして設定されます。
今回は簡易的に、同じ SSH鍵やパスワードを centosユーザにも設定して、パスワード無しで sudo 可能なようにしています。
(セキュリティ上は root でのログインを許可しないようにとか他にもいろいろ考慮するべきでしょうけれど省略しております)

Ceph セットアップ

環境の準備ができたところで、セットアップのための下準備をすすめていきます。
まずはマネージャサーバにログインします。
なお、今回は手元にある SSH鍵ファイルで全サーバに centosユーザでログイン可能な状態にしてありますので、そのあたりのファイルをマネージャサーバに転送します。

# scp -P 10022 -i SSH鍵.rsa SSH鍵.rsa centos@VPCルータのGIP:/home/centos/.ssh/id_rsa
# scp -P 10022 -i SSH鍵.rsa SSH鍵.rsa.pub centos@VPCルータのGIP:/home/centos/.ssh/id_rsa.pub

# ssh -p 10022 -i SSH鍵.rsa centos@VPCルータのGIP

[centos@mgr001 ~]$ chmod 600 ~/.ssh/id_rsa
[centos@mgr001 ~]$ chmod 644 ~/.ssh/id_rsa.pub

つづいて、マネージャサーバからホスト名指定で各サーバにログインできるように、 .ssh/config を設定します。

[centos@mgr001 ~]$ tee -a .ssh/config<<EOF
Host mgr001
    Hostname 192.168.200.10
    User centos
Host node001
    Hostname 192.168.200.11
    User centos
Host node002
    Hostname 192.168.200.12
    User centos
Host node003
    Hostname 192.168.200.13
    User centos
Host rados001
    Hostname 192.168.200.14
    User centos
EOF

[centos@mgr001 ~]$ chmod 600 ~/.ssh/config

こちらの設定確認も兼ねて、 ssh node001 といった形で、他 4台のサーバに順次ログインし、全サーバで以下を実行してホスト名を設定します。
sudo dnf update でアップデートが完了していることを確認したら、再起動しておきましょう。

[centos@mgr001 ~]$ sudo tee -a /etc/hosts<<EOF
192.168.200.10  mgr001
192.168.200.11  node001
192.168.200.12  node002
192.168.200.13  node003
192.168.200.14  rados001
EOF

最後に、マネージャサーバでも同様に上記のホスト名を設定してアップデートが完了していることを確認しつつ、以下をインストールして再起動しましょう。

[centos@mgr001 ~]$ sudo dnf -y install https://dl.fedoraproject.org/pub/epel/epel-release-latest-8.noarch.rpm
[centos@mgr001 ~]$ sudo dnf config-manager --set-enabled PowerTools

以上で準備が完了したので、いよいよ Ceph のセットアップに移ります。
マネージャサーバにログインし、以下を実行します。
ansible をインストールし、ceph-ansible をクローンしてきて、利用するブランチ(stable-5.0)を選択し、pythonモジュールを追加しています。

[centos@mgr001 ~]$ sudo dnf install ansible -y
[centos@mgr001 ~]$ git clone https://github.com/ceph/ceph-ansible.git
[centos@mgr001 ~]$ cd ceph-ansible/
[centos@mgr001 ceph-ansible]$ git checkout stable-5.0
[centos@mgr001 ceph-ansible]$ sudo pip3 install -r requirements.txt
[centos@mgr001 ceph-ansible]$ echo "PATH=\$PATH:/usr/local/bin" >>~/.bashrc
[centos@mgr001 ceph-ansible]$ source ~/.bashrc

あとは、 ceph-ansible/group_vars/ 配下に、セットアップ用のファイルを準備していきます。
今回は Ceph のバージョンは octopus (15) にしています。
(該当フォルダ内には各ファイルの .sample ファイルがありますので、それをコピーして中身を修正していく形です)

all.yml

ceph_release_num: 15
cluster: ceph

# Inventory host group variables
mon_group_name: mons
osd_group_name: osds
rgw_group_name: rgws
mds_group_name: mdss
nfs_group_name: nfss
rbdmirror_group_name: rbdmirrors
client_group_name: clients
iscsi_gw_group_name: iscsigws
mgr_group_name: mgrs
rgwloadbalancer_group_name: rgwloadbalancers
grafana_server_group_name: grafana-server

# Firewalld / NTP
configure_firewall: False
ntp_service_enabled: true
ntp_daemon_type: chronyd

# Ceph packages
ceph_origin: repository
ceph_repository: community
ceph_repository_type: cdn
ceph_stable_release: octopus

# Interface options
monitor_interface: eth0
radosgw_interface: eth1

# DASHBOARD
dashboard_enabled: True
dashboard_protocol: http
dashboard_admin_user: admin
dashboard_admin_password: パスワード

grafana_admin_user: admin
grafana_admin_password: パスワード
public_network: 192.168.200.0/24
cluster_network: 192.168.201.0/24

osds.yml

copy_admin_key: false
devices:
  - /dev/vdb
  - /dev/vdc

※ stable-4.0 以降では lvm を使うようです。NVMe などの SSDデバイスがあると、自動的にそこが journalディスクになってくれるようです。
それ以外は own collocated journal になってしまうようで、パフォーマンスは期待できないかもしれません。
やはり SSD は必要な気がしますね。
なお以前のバージョンでは自分で journal用のディスクを指定できましたが、今は非推奨のようなので、試しませんでした。
https://docs.ceph.com/ceph-ansible/master/osds/scenarios.html#osd-scenario-lvm

rgws.yml

copy_admin_key: true

rgwloadbalancers.yml

---
# Variables here are applicable to all host groups NOT roles

# This sample file generated by generate_group_vars_sample.sh

# Dummy variable to avoid error because ansible does not recognize the
# file as a good configuration file when no variable in it.
# dummy:

# You can override vars by using host or group vars

###########
# GENERAL #
###########

# haproxy_frontend_port: 80
haproxy_frontend_ssl_port: 443
haproxy_frontend_ssl_certificate: /etc/ssl/certs/server.pem
haproxy_ssl_dh_param: 4096
haproxy_ssl_ciphers:
  - EECDH+AESGCM
  - EDH+AESGCM
haproxy_ssl_options:
  - no-sslv3
  - no-tlsv10
  - no-tlsv11
  - no-tls-tickets

virtual_ips:
   - rados001のGIP
#   - 192.168.238.251
#
virtual_ip_netmask: 24
virtual_ip_interface: eth0

※これが正しいのかわかっていないのですが、RGW の中に rgwloadbalancer を同居させ、eth0 側のグローバルIP で待ち受けてもらう形にしました。
また SSL化したかったので、別途 Let's Encrypt で作成した証明書を /etc/ssl/certs/server.pem として配置してあります。
証明書の作成にはこちらを参考にさせていただきました。
なお、証明書を 1枚にする必要があるため、 fullchain.pem と privkey.pem をあわせたものを利用しています。
(コマンド例: cat fullchain.pem privkey.pem > server.pem)

以上で設定ファイルの準備ができますので、 ceph-ansible ディレクトリ直下に戻り、 hosts ファイルを作成します。

hosts

# Ceph admin user for SSH and Sudo
[all:vars]
ansible_ssh_user=centos
ansible_become=true
ansible_become_method=sudo
ansible_become_user=root

# Ceph Monitor Nodes
[mons]
node001
node002
node003

# MDS Nodes
# [mdss]
# node001
# node002
# node003

# [clients]
#

# RGW
[rgws]
rados001

[rgwloadbalancers]
rados001

# Manager Daemon Nodes
[mgrs]
mgr001

# set OSD (Object Storage Daemon) Node
[osds]
node001
node002
node003

# Grafana server
[grafana-server]
mgr001

最後に site.yml を用意し、 ansible でセットアップを実行します。
問題無く完了したら、セットアップできていることを確認します。
(以下のコマンド結果はセットアップ直後のものではなく、あとから確認したものになります。failed や error が無ければだいじょうぶです)

[centos@mgr001 ceph-ansible]$ cp site.yml.sample site.yml
[centos@mgr001 ceph-ansible]$ ansible-playbook -i hosts site.yml -e 'ansible_python_interpreter=/usr/bin/python3'

～省略～

TASK [show ceph status for cluster ceph] *****************************************************
Wednesday 10 June 2020  19:05:41 +0900 (0:00:00.983)       0:05:59.041 ********
ok: [node001 -> node001] =>
  msg:
  - '  cluster:'
  - '    id:     ecb7d866-e7e2-4aca-aba3-c526a556b4cb'
  - '    health: HEALTH_OK'
  - ' '
  - '  services:'
  - '    mon: 3 daemons, quorum node001,node002,node003 (age 2w)'
  - '    mgr: mgr001(active, since 20s)'
  - '    osd: 6 osds: 6 up (since 2w), 6 in (since 2w)'
  - '    rgw: 1 daemon active (rados001.rgw0)'
  - ' '
  - '  task status:'
  - ' '
  - '  data:'
  - '    pools:   8 pools, 177 pgs'
  - '    objects: 7.20k objects, 25 GiB'
  - '    usage:   83 GiB used, 517 GiB / 600 GiB avail'
  - '    pgs:     177 active+clean'
  - ' '
  - '  io:'
  - '    client:   1.2 KiB/s rd, 1 op/s rd, 0 op/s wr'
  - ' '

PLAY RECAP ***********************************************************************************
mgr001                     : ok=226  changed=7    unreachable=0    failed=0    skipped=314  rescued=0    ignored=0
node001                    : ok=221  changed=7    unreachable=0    failed=0    skipped=292  rescued=0    ignored=0
node002                    : ok=181  changed=6    unreachable=0    failed=0    skipped=272  rescued=0    ignored=0
node003                    : ok=183  changed=6    unreachable=0    failed=0    skipped=270  rescued=0    ignored=0
rados001                   : ok=168  changed=8    unreachable=0    failed=0    skipped=245  rescued=0    ignored=0


INSTALLER STATUS *****************************************************************************
Install Ceph Monitor           : Complete (0:00:33)
Install Ceph Manager           : Complete (0:00:25)
Install Ceph OSD               : Complete (0:00:50)
Install Ceph RGW               : Complete (0:00:33)
Install Ceph RGW LoadBalancer  : Complete (0:00:14)
Install Ceph Dashboard         : Complete (0:00:48)
Install Ceph Grafana           : Complete (0:00:39)
Install Ceph Node Exporter     : Complete (0:00:41)

Wednesday 10 June 2020  19:05:41 +0900 (0:00:00.076)       0:05:59.118 ********
===============================================================================
ceph-dashboard : set or update dashboard admin username and password ----------------- 11.34s
ceph-handler : restart ceph rgw daemon(s) -------------------------------------------- 10.85s
ceph-container-engine : install container packages ------------------------------------ 8.50s
gather and delegate facts ------------------------------------------------------------- 8.11s
ceph-osd : systemd start osd ---------------------------------------------------------- 3.55s
ceph-grafana : wait for grafana to start ---------------------------------------------- 3.49s
ceph-config : look up for ceph-volume rejected devices -------------------------------- 3.05s
ceph-config : look up for ceph-volume rejected devices -------------------------------- 2.82s
ceph-osd : apply operating system tuning ---------------------------------------------- 2.70s
ceph-infra : install chrony ----------------------------------------------------------- 2.69s
ceph-common : configure red hat ceph community repository stable key ------------------ 2.32s
ceph-facts : resolve device link(s) --------------------------------------------------- 2.21s
ceph-facts : check if the ceph mon socket is in-use ----------------------------------- 2.14s
ceph-common : install centos dependencies --------------------------------------------- 2.06s
ceph-rgw : copy ceph key(s) if needed ------------------------------------------------- 2.05s
ceph-prometheus : ship systemd services ----------------------------------------------- 2.05s
ceph-dashboard : enable mgr dashboard module (restart) -------------------------------- 2.03s
ceph-common : install dnf-plugins-core ------------------------------------------------ 1.98s
ceph-grafana : install ceph-grafana-dashboards package on RedHat or SUSE -------------- 1.95s
ceph-common : install redhat ceph packages -------------------------------------------- 1.95s


[centos@mgr001 ceph-ansible]$ ssh node001 "ceph --version"
ceph version 15.2.2 (0c857e985a29d90501a285f242ea9c008df49eb8) octopus (stable)

[centos@mgr001 ceph-ansible]$ ssh node001 "sudo ceph -s"
  cluster:
    id:     ecb7d866-e7e2-4aca-aba3-c526a556b4cb
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum node001,node002,node003 (age 2w)
    mgr: mgr001(active, since 12d)
    osd: 6 osds: 6 up (since 2w), 6 in (since 2w)
    rgw: 1 daemon active (rados001.rgw0)

  task status:

  data:
    pools:   8 pools, 177 pgs
    objects: 7.20k objects, 25 GiB
    usage:   83 GiB used, 517 GiB / 600 GiB avail
    pgs:     177 active+clean

動作確認

以下で GUI にアクセスできます。

manager : http://VPCルータのGIP:8443

grafana : http://VPCルータのGIP:3000

なお grafana の方がログインしなくても閲覧可になっています。
気になる場合は、 /etc/grafana/grafana.ini の [auth.anonymous] セクションの設定を無効 enabled = false にすることで、ログインしないと閲覧できないようになります。
設定変更時は sudo systemctl restart grafana-server を実行してください。
(ceph-ansible実行時に反映させたい場合は ceph-ansible/roles/ceph-grafana/templates/grafana.ini.j2 を修正する必要があります)

なお default で replica size は 3 になっています。
なので、保存したデータの 3倍くらいの容量が消費されていく感じになるようです。

それではまずは実際に rados001サーバにて、アクセスキーを発行してみます。

[centos@rados001 ~]$ sudo radosgw-admin user create --uid=username --display-name="username" --email=username@example.jp
{
    "user_id": "username",
    "display_name": "username",
    "email": "username@example.jp",
    "suspended": 0,
    "max_buckets": 1000,
    "subusers": [],
    "keys": [
        {
            "user": "username",
            "access_key": "**************************",
            "secret_key": "****************************************************"
        }
    ],
    "swift_keys": [],
    "caps": [],
    "op_mask": "read, write, delete",
    "default_placement": "",
    "default_storage_class": "",
    "placement_tags": [],
    "bucket_quota": {
        "enabled": false,
        "check_on_raw": false,
        "max_size": -1,
        "max_size_kb": 0,
        "max_objects": -1
    },
    "user_quota": {
        "enabled": false,
        "check_on_raw": false,
        "max_size": -1,
        "max_size_kb": 0,
        "max_objects": -1
    },
    "temp_url_keys": [],
    "type": "rgw",
    "mfa_ids": []
}

これで準備はできました。
あとは aws cli の入っている環境で、 aws configure --profile=ceph-username のようにして、アクセスキーとシークレットアクセスキーを登録しておきます。
そして S3 API での動作をテストしてみます。

* バケット作成
# aws --profile=ceph-username --endpoint-url=https://rados001.ドメイン名 s3api create-bucket --bucket username-test001
# aws --profile=ceph-username --endpoint-url=https://rados001.ドメイン名 s3 ls
2020-05-27 13:49:18 username-test001

* ファイルコピー
# aws --profile=ceph-username --endpoint-url=https://rados001.ドメイン名 s3 cp index.html s3://username-test001
# aws --profile=ceph-username --endpoint-url=https://rados001.ドメイン名 s3 ls s3://username-test001
2020-05-28 11:54:22          6 index.html

* ファイル公開(ACL)
# aws --profile=ceph-username --endpoint-url=https://rados001.ドメイン名 s3api put-object-acl --bucket username-test001 --key index.html --acl public-read

* presign
# aws --profile=ceph-username --endpoint-url=https://rados001.ドメイン名 s3 presign s3://username-test001/index.html --expires-in 30
https://rados001.ドメイン名/username-test001/index.html?AWSAccessKeyId=**************************&Signature=************************************&Expires=1590718060

* バケットポリシー追加、削除(https://docs.ceph.com/docs/master/radosgw/bucketpolicy/)
# cat public.json
{
    "Version": "2012-10-17",
    "Id": "PublicRead",
    "Statement": [
        {
            "Sid": "ReadAccess",
            "Effect": "Allow",
            "Principal": "*",
            "Action": "s3:GetObject",
            "Resource": "arn:aws:s3:::username-test001/*"
        }
    ]
}

# aws --profile=ceph-username --endpoint-url=https://rados001.ドメイン名 s3api put-bucket-policy --bucket username-test001 --policy file://./public.json
# aws --profile=ceph-username --endpoint-url=https://rados001.ドメイン名 s3api get-bucket-policy --bucket username-test001
{
    "Policy": "{\n    \"Version\": \"2012-10-17\",\n    \"Id\": \"PublicRead\",\n    \"Statement\": [\n        {\n            \"Sid\": \"ReadAccess\",\n            \"Effect\": \"Allow\",\n            \"Principal\": \"*\",\n            \"Action\": \"s3:GetObject\",\n            \"Resource\": \"arn:aws:s3:::username-test001/*\"\n        }\n    ]\n}\n"
}
# aws --profile=ceph-username --endpoint-url=https://rados001.ドメイン名 s3api delete-bucket-policy --bucket username-test001

無事に S3互換のオブジェクトストレージとして利用できることが確認できました。

おまけ

bucket の命名規則を緩和したい場合は、rados001サーバで /etc/ceph/ceph.conf ファイルの [client.rgw.XXXXXXXXXX] セクションに rgw_relaxed_s3_bucket_names = true を追記することになります。
設定変更時は sudo systemctl restart ceph-radosgw@* を実行してください。
これによって、バケット名に _ などを入れることが可能になります。
自前でオブジェクトストレージを用意すると、DNS連携までしないことが多いと思います。
仮想ホスティング方式 例:https://bucket-name.s3.Region.amazonaws.com/key name にさせず、強制的にパス形式 例:https://s3.Region.amazonaws.com/bucket-name/key name になるようにするには、このあたりの調整が必要なケースもありそうです。
(ceph-ansible実行時に反映させたい場合は ceph-ansible/roles/ceph-config/templates/ceph.conf.j2 を修正する必要があります)

なお今回は試していませんが、hostsファイルで [clients] を指定してブロックストレージを利用したり、 [mdss] を指定してファイルシステムを利用することも可能なようです。

感想

テストの構築くらいはできましたが、これを本番運用するとなると、非常に厳しいだろうな・・・と想像します。
某国内クラウドベンダー様でも、障害で数ヶ月アクセスできなくなった、とかもありましたし。。。
今回さわっていて思いつくだけでも、以下のナレッジが無く、悩ましいところです。
(オンプレミス前提で考えてます)

課題

今回 public_network とサービス利用の NW を分けたが、そもそもそれが正しいのか不明。NW設計の知見があるところに手伝ってもらいたい。
マネージャや RADOS を冗長化するべきなのか、悩ましい。このあたりも、物理だとかなり大変そうだが、仮想化するべきなのだろうか？こちらも設計の知見があるところに手伝ってもらいたい。
OSD は単発ディスクがよいらしい(RAIDにしない)が、単発ディスクの障害は、ホットスワップ交換できるのだろうか？できない気がするが、その場合はディスク交換が必要になるたびにノード停止してから交換なのだろうか？これは結構手間と考えられる。
RAIDでない単発ディスクの監視方法など、システム全体の適切な監視(ログ管理含め)についても知見があるところに手伝ってもらいたい。
そもそも全容を把握しきれていないので、きちんと運用に必要な一通りの機能や操作を確認するのが大変。

最後に

よし、S3 を使おう！

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up