0
1

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 5 years have passed since last update.

Fujitsu PRIMERGY RX200S7にCentOS7をインストール〜クラスタ構成編

Last updated at Posted at 2019-08-16

1ノードをヘッドノードにして、NIS/NFSでユーザ/ファイルを共有し、Torqueジョブスケジューラを導入することで計算ノードに個別にログインしなくて良いようにする。計算の規模がそこまで大きくないのでノードをまたぐMPIは入れない。

##ハードウェア構成

  • ヘッドノード:RX1330M3
  • 計算ノード1〜4:RX200S7 or RX300S7

##CentOS7のインストール
DVD-Rに焼いたインストーラisoイメージ使用。
すべてのノードに最小限の構成でインストール。
IPv4/hostname
ヘッドノード
10.0.0.1
192.168.1.1/rx1330m3
計算ノード
192.168.1.2/n0_rx200s7
192.168.1.3/n1_rx300s7
192.168.1.4/n2_rx300s7
192.168.1.5/n3_rx300s7
とする。

  • 言語は日本語を選択。
  • インストールの概要画面でキーボードとしてUS、ソフトウェアの選択で最小限のインストールを選択。
  • ネットワークとホスト名で上記hostnameを入力、接続するイーサポートをオンにして、IPv4タグ内の「方式」を手動にしてIP addressを設定する。DNSサーバは8.8.8.8としておく。
  • 「インストールの開始」をクリックしてインストールを始めると、rootパスワード、ユーザの作成画面が開くので、rootパスワードだけを設定しておく。
  • ヘッドノードのみユーザの作成をクリックし、一般ユーザを必要なだけ作成する。

##NFSサーバの設定
とりあえずviエディタはすぐに使い方を忘れてしまうので簡単なnanoをインストールしておく

# yum install nano

早速NFSを設定していく

# yum -y install nfs-utils
# nano /etc/idmapd.conf
# cat /etc/idmapd.conf
........
#Domain = local.domain.edu 
Domain = rx1330m3
........

# nano /etc/exports
# cat /etc/exports
/home 192.168.1.0/24(rw,async,no_root_squash)
/usr/local 192.168.1.0/24(rw,async,no_root_squash)

# systemctl start rpcbind nfs-server 
# systemctl enable rpcbind nfs-server 

ファイアウォールに穴を開ける

# firewall-cmd --add-service=nfs --permanent 
success
# firewall-cmd --reload 
success

##NISサーバの設定

# yum -y install ypserv rpcbind
# ypdomainname nis 
# echo "NISDOMAIN= nis" >> /etc/sysconfig/network 
# nano /var/yp/securenets
# cat /var/yp/securenets

# NISにアクセス許可する範囲を指定
255.0.0.0       127.0.0.0
255.255.255.0   192.168.1.0

# nano /etc/hosts
# cat /etc/hosts

127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
10.0.1.1	rx1330m3
192.168.1.1	rx1330m3 rx1330m3.nis
192.168.1.2	n0-rx200s7
192.168.1.3	n1-rx300s7
192.168.1.4	n2-rx300s7
192.168.1.5	n3-rx300s7

# systemctl start rpcbind ypserv ypxfrd yppasswdd 
# systemctl enable rpcbind ypserv ypxfrd yppasswdd

# /usr/lib64/yp/ypinit -m
At this point, we have to construct a list of the hosts which will run NIS servers. dlp is in the list of NIS server hosts. Please continue to add the names for the other hosts, one per line. When you are done with the
list, type a <control D>.
next host to add: rx1330m3.nis
next host to add:
The current list of NIS servers looks like this:

rx1330m3.nis

Is this correct? [y/n: y]
We need a few minutes to build the databases...
Building /var/yp/nis/ypservers...
Running /var/yp/Makefile...
gmake[1]: Entering directory `/var/yp/nis'
Updating passwd.byname...
Updating passwd.byuid...
Updating shadow.byname...
Updating group.byname...
Updating group.bygid...
Updating hosts.byname...
Updating hosts.byaddr...
Updating rpc.byname...
Updating rpc.bynumber...
Updating services.byname...
Updating services.byservicename...
Updating netid.byname...
Updating protocols.bynumber...
Updating protocols.byname...
Updating mail.aliases...
gmake[1]: Leaving directory `/var/yp/nis'
rx1330m3.nis has been set up as a NIS master server.
Now you can run ypinit -s rx1330m3 on all slave server.

新規にnis userを追加したときは

# cd /var/yp 
# make 

としておく。

最後にfirewalldに穴を開ける

# nano /etc/sysconfig/network
# cat /etc/sysconfig/network
NISDOMAIN=nis

YPSERV_ARGS="-p 944"
YPXFRD_ARGS="-p 945"

# nano /etc/sysconfig/yppasswdd
# cat /etc/sysconfig/yppasswdd
........
# Additional arguments passed to yppasswd
YPPASSWDD_ARGS="--port 946"

# systemctl restart rpcbind ypserv ypxfrd yppasswdd
# firewall-cmd --add-service=rpc-bind --permanent 
# firewall-cmd --add-port=944/tcp --permanent 
# firewall-cmd --add-port=944/udp --permanent 
# firewall-cmd --add-port=945/tcp --permanent 
# firewall-cmd --add-port=945/udp --permanent 
# firewall-cmd --add-port=946/udp --permanent 
# firewall-cmd --reload 

##Torqueサーバの設定

# yum install -y epel-release
# yum install -y torque-client torque-mom torque-server torque-scheduler

# create-munge-key

# pbs_server -t create -f -D & pbs_server_pid=$!

# kill $pbs_server_pid

# nano /var/lib/torque/server_priv/nodes 
# cat /var/lib/torque/server_priv/nodes 
rx1330m3 np=8 A num_node_boards=1 numa_board_str=8
n0-rx200s7 np=24 B num_node_boards=1 numa_board_str=24
n1-rx300s7 np=24 B num_node_boards=1 numa_board_str=24
n2-rx300s7 np=24 B num_node_boards=1 numa_board_str=24
n3-rx300s7 np=16 C num_node_boards=1 numa_board_str=16

# echo "rx1330m3" > /etc/torque/server_name
# cat /etc/torque/server_name 
rx1330m3

# nano /var/lib/torque/mom_priv/config
# cat /var/lib/torque/mom_priv/config
  # Configuration for pbs_mom.
# $pbsserver localhost
$pbsserver rx1330m3

サーバを起動する

# systemctl start munge
# systemctl start trqauthd
# systemctl start pbs_server
# systemctl start pbs_sched
# systemctl start pbs_mom
# systemctl enable munge trqauthd pbs_server pbs_sched pbs_mom

キューを設定する。 キューの名前はここではbatchとしている。

# qmgr -c "create queue batch queue_type=execution"
# qmgr -c "set queue batch started=true"
# qmgr -c "set queue batch enabled=true"
# qmgr -c "set queue batch resources_default.nodes=1"
# qmgr -c "set queue batch resources_default.walltime=3600"
# qmgr -c "set server default_queue=batch"
# qmgr -c "set server scheduling=true"

設定の確認

# qmgr -c 'p s'

    #
    # Create queues and set their attributes.
    #
    #
    # Create and define queue batch
    #
    create queue batch
    set queue batch queue_type = Execution
    set queue batch resources_default.nodes = 1
    set queue batch resources_default.walltime = 01:00:00
    set queue batch enabled = True
    set queue batch started = True
    #
    # Set server attributes.
    #
    set server scheduling = True
    set server acl_hosts = rx1330m3
    set server acl_hosts += localhost
    set server default_queue = batch
    set server log_events = 511
    set server mail_from = adm
    set server scheduler_iteration = 600
    set server node_check_rate = 150
    set server tcp_timeout = 300
    set server job_stat_rate = 45
    set server poll_jobs = True
    set server mom_job_sync = True
    set server next_job_number = 0
    set server moab_array_compatible = True
    set server nppcu = 1

以上でヘッドノードの設定は終わり。

##NFS/NISクライアントの設定
各計算ノードにて

# yum install ypbind rpcbind
# yum install nano
# nano /etc/hosts
# cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.1.1     rx1330m3 rx1330m3.nis
192.168.1.2    n0-rx200s7
192.168.1.3   n1-rx300s7
192.168.1.4   n2-rx300s7
192.168.1.5   n3-rx300s7

# authconfig --enablenis --nisdomain=nis --nisserver=rx1330m3 --enablemkhomedir --update
# systemctl start rpcbind ypbind
# systemctl enable rpcbind ypbind

NFSのauto mountを導入

# yum -y install nfs-utils
# nano /etc/idmapd.conf 
# cat /etc/idmapd.conf
[General]
#Verbosity = 0
# The following should be set to the local NFSv4 domain name
# The default is the host's DNS domain name.
Domain = n0-rx200s7
.........

# systemctl restart rpcbind
# nano /etc/fstab
# cat /etc/fstab
#
# /etc/fstab
# Created by anaconda on Tue Aug 13 13:07:11 2019
#
# Accessible filesystems, by reference, are maintained under '/dev/disk'
# See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info
#
/dev/mapper/centos-root /                       xfs     defaults	0 0
UUID=********-****-****-****-************ /boot                   xfs     defaults        0 0
/dev/mapper/centos-home /home                   xfs     defaults	0 0
/dev/mapper/centos-swap swap                    swap    defaults	0 0
rx1330m3:/usr/local  /usr/local              nfs     defaults	0 0

# yum -y install autofs
# nano /etc/auto.master
# cat /etc/auto/master
.........
/-    /etc/auto.mount

# nano /etc/auto.mount
# cat /etc/auto.mount
/home -fstype=nfs,rw rx1330m3:/home

# systemctl start autofs
# systemctl enable autofs

##torqueクライアントの設定
ヘッドノードもクライアントとして使うので、すべてのノードで以下の設定をおこなう。

# yum install -y epel-release
# yum install -y torque-client torque-mom
# nano /etc/torque/server_name
# cat /etc/torque/server_name
rx1330m3

# nano /var/lib/torque/mom_priv/config

# cat /var/lib/torque/mom_priv/config
# Configuration for pbs_mom.
$pbsserver rx1330m3
$usecp *:/home /home
$usecp *:/mnt/nfs /mnt/nfs
$log_file_suffix %h

# firewall-cmd --add-port=15001/tcp --zone=public --permanent
# firewall-cmd --add-port=15002/tcp --zone=public --permanent
# firewall-cmd --add-port=15003/tcp --zone=public --permanent
# firewall-cmd --add-port=15003/udp --zone=public --permanent
# firewall-cmd --add-port=15004/tcp --zone=public --permanent
# systemctl start pbs_mom
# systemctl enable pbs_mom

これですべてのノードで一般ユーザでログインできるはず。

更に、

$ pbsnodes -a
rx1330m3-0
     state = free
     np = 8
     properties = A
     ntype = cluster
     status = rectime=1565951735,varattr=,jobs=,state=free,netload=? 0,gres=,loadave=0.00,ncpus=0,physmem=66969408kb,availmem=62775256kb,totmem=66969408kb,idletime=243263,nusers=0,nsessions=0,uname=Linux rx1330m3 3.10.0-957.21.3.el7.x86_64 #1 SMP Tue Jun 18 16:35:19 UTC 2019 x86_64,opsys=linux
     mom_service_port = 15002
     mom_manager_port = 15003

n0-rx200s7-0
     state = free
     np = 24
     properties = B
     ntype = cluster
     status = rectime=1565708244,varattr=,jobs=,state=free,netload=? 0,gres=,loadave=0.00,ncpus=12,physmem=20936928kb,availmem=20096968kb,totmem=20936928kb,idletime=892,nusers=0,nsessions=0,uname=Linux n0-rx200s7 3.10.0-957.27.2.el7.x86_64 #1 SMP Mon Jul 29 17:46:05 UTC 2019 x86_64,opsys=linux
     mom_service_port = 15002
     mom_manager_port = 15003

n1-rx300s7-0
     state = free
     np = 24
     properties = B
     ntype = cluster
     status = rectime=1565708157,varattr=,jobs=,state=free,netload=? 0,gres=,loadave=0.00,ncpus=12,physmem=12515480kb,availmem=11721452kb,totmem=12515480kb,idletime=973,nusers=0,nsessions=0,uname=Linux n1-rx300s7 3.10.0-957.21.3.el7.x86_64 #1 SMP Tue Jun 18 16:35:19 UTC 2019 x86_64,opsys=linux
     mom_service_port = 15002
     mom_manager_port = 15003

n2-rx300s7-0
     state = down
     np = 24
     properties = B
     ntype = cluster
     status = rectime=1565708199,varattr=,jobs=,state=free,netload=? 0,gres=,loadave=0.00,ncpus=12,physmem=16710376kb,availmem=15940304kb,totmem=16710376kb,idletime=3222,nusers=0,nsessions=0,uname=Linux n2-rx300s7 3.10.0-957.21.3.el7.x86_64 #1 SMP Tue Jun 18 16:35:19 UTC 2019 x86_64,opsys=linux
     mom_service_port = 15002
     mom_manager_port = 15003

n4-rx300s7-0
     state = down
     np = 16
     properties = C
     ntype = cluster
     status = rectime=1565708244,varattr=,jobs=,state=free,netload=? 0,gres=,loadave=0.00,ncpus=8,physmem=12521784kb,availmem=11854408kb,totmem=12521784kb,idletime=20669,nusers=0,nsessions=0,uname=Linux n3-rx300s7 3.10.0-957.21.3.el7.x86_64 #1 SMP Tue Jun 18 16:35:19 UTC 2019 x86_64,opsys=linux
     mom_service_port = 15002
     mom_manager_port = 15003

このように起動している計算ノードには
state = free
起動していない計算ノードは
state = down
という表示がつく。

0
1
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
1

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?