More than 5 years have passed since last update.

keepalivedがどうやってVRRPを取るのか裏側の動きを理解する～これがマルチキャストか～

Last updated at 2018-06-15Posted at 2018-06-15

目的

IPを複数マシンで共有して冗長化ができる VRRP。そのオープンソースな keepalived が好き。
しかし、keepalivedを使っているといつもarp cacheがクリアされなくて困るので、動きを理解する。
（arp cahceが残ると、IPがa->bに切り替わってもclientはaのmac address宛に通信し続けてしまう）

arp cacheをクリアさせる命令であるgarpのパラメータをkeepalivedにつけてもうまくいってなかったので...

今度 Linux NAT を建てたくて、そのNATの冗長化をするためにちゃんと学ぶ。

環境

win10 + vagrant で cent7のvm4つ
- 参考にしました
Vagrant + VirtualBoxで3つのVMをさくっと立ち上げる方法 | エンジニアっぽいことを書くブログ

定義

here
- 通信したい人
- 192.168.33.2
nat1
- vrrp持ちたい人その１
- 192.168.33.101
nat2
- vrrp持ちたい人その２
- 192.168.33.102
vrrp
- nat1か2が持ってるvirtual IP
- 192.168.33.100

keepalivedを起動する

install

yum install -y keepalived
vim /etc/keepalived/keepalived.conf # 設定は下に書く
systemctl start keepalived
ip a | grep 33.100 # ipできる
```

するとiptablesにdropが作られて、せっかくのipでなんも通信でけへん・・・・

[root@SX0000030540-00 ~]# iptables -nvL
Chain INPUT (policy ACCEPT 182 packets, 11875 bytes)
pkts bytes target prot opt in out source destination
0 0 DROP all -- * * 0.0.0.0/0 0.0.0.0/0 match-set keepalived dst
```
- vrrp_strict がいけなかった。

    > https://github.com/acassen/keepalived/issues/474

- vrrp_strictを外したら、dropされなくなった。

keepavlivedにgarpの設定を追加する

garpはいろいろ設定あるので、とりあえずstackoverflowからコピー。

https://serverfault.com/questions/821809/keepalived-send-gratuitous-arp-periodically/822004

が、理解するためにいったんとある２つ以外はdefaultにした。

いくつかskeltonな設定を削除してる。

（パケットキャプチャする場合は間隔が5-10sくらいに長いほうが便利だということを学んだ）
keepalivedの設定

$ cat /etc/keepalived/keepalived.conf

! Configuration File for keepalived
global_defs {
   router_id LVS_DEVEL
   vrrp_skip_check_adv_addr

#   # delay for second set of gratuitous ARPs after transition to MASTER
#   vrrp_garp_master_delay 1    # seconds, default 5, 0 for no second set

#   # number of gratuitous ARP messages to send at a time after transition to MASTER
#   vrrp_garp_master_repeat 1    # default 5

#   # delay for second set of gratuitous ARPs after lower priority advert received when MASTER
#   vrrp_garp_lower_prio_delay 10

#   # number of gratuitous ARP messages to send at a time after lower priority advert received when MASTER
#   vrrp_garp_lower_prio_repeat 1

    # minimum time interval for refreshing gratuitous ARPs while MASTER
    vrrp_garp_master_refresh 5  # secs, default 0 (no refreshing)

#    # number of gratuitous ARP messages to send at a time while MASTER
#    vrrp_garp_master_refresh_repeat 2 # default 1

    # Delay in ms between gratuitous ARP messages sent on an interface
    vrrp_garp_interval 3         # decimal, seconds (resolution usecs). Default 0.

#   # Delay in ms between unsolicited NA messages sent on an interface
#   vrrp_gna_interval 0.000001        # decimal, seconds (resolution usecs). Default 0.
}

vrrp_instance VI_1 {
    state MASTER
    interface eth1
    virtual_router_id 51
    priority 100
    advert_int 5
    virtual_ipaddress {
        192.168.33.100/24
    }
}

vrrpのgarpを学ぶ

srcアドレスごとにパケットキャプチャする

here でパケットキャプチャ

  tcpdump -nni eth1 -n src net 192.168.33.0/24

これだと誰が送ってるかわからないので個別に流す

    tcpdump -v -nni eth1 -n src net 192.168.33.100          # vrrp
    tcpdump -v -nni eth1 -n src net 192.168.33.101          # nat1
    tcpdump -v -nni eth1 -n src net 192.168.33.102          # nat2

キャプチャしていく

まずkeepalivedを起動していない間は何も流れていない。
nat1のkeepalivedを起動。multicastの.22に to_in が飛ぶ

nat1の固有IPからマルチキャストへの参加通知が飛ぶ `.22` への `to_in` が参加通知か

00:41:50.733010 IP (tos 0xc0, ttl 1, id 0, offset 0, flags [DF], proto IGMP (2), length 40, options (RA))
192.168.33.101 > 224.0.0.22: igmp v3 report, 1 group record(s) [gaddr 224.0.0.18 to_in, 0 source(s)]
```

その後、.18 に向けてkeepalivedのadvertisementパケットが流れ続ける

nat1の固有IPからvrrpの死活監視 advertisementが飛び続ける（自分生きてます！アピール）

`.18`が死活監視用のmulticast IPか

00:41:55.103390 IP (tos 0xc0, ttl 255, id 1, offset 0, flags [none], proto VRRP (112), length 40)
192.168.33.101 > 224.0.0.18: vrrp 192.168.33.101 > 224.0.0.18: VRRPv2, Advertisement, vrid 51, prio 100, authtype none, intvl 1s, length 20, addrs: 192.168.33.100
```
- この間隔は advert_int で調整可能

vrrpからはarpリクエストが流れ続ける

arpリクエスト who-has

01:13:00.456923 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.33.100 (ff:ff:ff:ff:ff:ff) tell 192.168.33.100, length 46
```

これは vrrp_garp_master_refresh と vrrp_garp_interval の間隔によって変わる。この動きはよくわからない。
以下の設定（ほかはdefault）にしたとき、arp requestは 4秒, 6秒, 4秒, 5秒, 7秒, 以後ずっと6秒... という間隔で飛ぶ。

vrrp_garp_interval 4         # decimal, seconds (resolution usecs). Default 0.
vrrp_garp_master_refresh 6  # secs, default 0 (no refreshing)
    ```
   - `vrrp_garp_interval` は最初の１～２回にのみ影響するのだろうか。あとでマニュアル見る。

advertisementもarp requestもkeepalivedが出していて、keepalivedを止めると止まる。
このとき、キャプチャしている here ではこのmacがvrrp持ってる

[root@SX0000030540-00 ~]# ping -c1 vrrp # ok
[root@SX0000030540-00 ~]# arp
Address HWtype HWaddress Flags Mask Iface
vrrp ether 08:00:27:29:8f:da C eth1
```

nat2のvrrpを上げ、nat1/2両方走っている状態にする

start
```
nat2$ systemctl start keepalived
```
- vrrpの候補としてマルチキャストに参加するが、nat1が強いのでipは奪わない。

[root@SX0000030540-00 ~]# tcpdump -v -nni eth1 -n src net 192.168.33.102

マルチキャストに参加する。`to_ex`を`.22`に送っている。

00:53:45.804168 IP (tos 0xc0, ttl 1, id 0, offset 0, flags [DF], proto IGMP (2), length 40, options (RA))
192.168.33.102 > 224.0.0.22: igmp v3 report, 1 group record(s) [gaddr 224.0.0.18 to_ex, 0 source(s)]
00:53:50.542786 IP (tos 0xc0, ttl 1, id 0, offset 0, flags [DF], proto IGMP (2), length 40, options (RA))
192.168.33.102 > 224.0.0.22: igmp v3 report, 1 group record(s) [gaddr 224.0.0.18 to_ex, 0 source(s)]
```
- ~~to_exは自分が external になってるよ！っていう意味かも。飛ばす前にすでに `to_in` してる人をチェックしてるぽい~~
- ここはよくわからないがRFC によると `Change to EXCLUDE (x,G) join` らしい。とりあえずjoinするものらしい
- IPは奪わないので、garpを vrrp から飛ばすことはできない。

nat1を止めて、ipをnat2に移譲する

stop
nat1$ systemctl stop keepalived
マルチキャストのadvertisementが止まったのをnat2が検知して？、IPを取得する

nat1からの死亡通知？でも `to_in`

01:46:20.310057 IP (tos 0xc0, ttl 1, id 0, offset 0, flags [DF], proto IGMP (2), length 40, options (RA))
192.168.33.101 > 224.0.0.22: igmp v3 report, 1 group record(s) [gaddr 224.0.0.18 to_in, 0 source(s)]
01:46:21.424993 IP (tos 0xc0, ttl 1, id 0, offset 0, flags [DF], proto IGMP (2), length 40, options (RA))
192.168.33.101 > 224.0.0.22: igmp v3 report, 1 group record(s) [gaddr 224.0.0.18 to_in, 0 source(s)]
```
- multicastのIPが .22 で送ってるのは `to_in`. これによって参加・離脱を表現しているのだろうか。
- 離脱のときも `to_in` なのか？最初に.22に投げれば参加。もう一回.22に投げれば離脱？でも複数回送ってるんだよなぁ。
- →とりあえずRFC には TO_IN で`Join/Leave` とある。

参加したら、死活監視が始まる

nat2からのadvertisementが飛び始める

01:47:16.930439 IP (tos 0xc0, ttl 255, id 9, offset 0, flags [none], proto VRRP (112), length 40)
192.168.33.102 > 224.0.0.18: vrrp 192.168.33.102 > 224.0.0.18: VRRPv2, Advertisement, vrid 51, prio 100, authtype none, intvl 7s, length 20, addrs: 192.168.33.100
py

vrrpからはnat2のarp requestが来る（nat1とは違うintervalにしてたので 3秒,4秒,3秒,4秒,以後7秒...）

01:46:27.921383 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.33.100 (ff:ff:ff:ff:ff:ff) tell 192.168.33.100, length 46
01:46:30.923661 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.33.100 (ff:ff:ff:ff:ff:ff) tell 192.168.33.100, length 46
01:46:34.922249 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.33.100 (ff:ff:ff:ff:ff:ff) tell 192.168.33.100, length 46
01:46:37.923364 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.33.100 (ff:ff:ff:ff:ff:ff) tell 192.168.33.100, length 46
01:46:41.923788 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.33.100 (ff:ff:ff:ff:ff:ff) tell 192.168.33.100, length 46
01:46:48.925458 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.33.100 (ff:ff:ff:ff:ff:ff) tell 192.168.33.100, length 46
01:46:55.926839 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.33.100 (ff:ff:ff:ff:ff:ff) tell 192.168.33.100, length 46
01:47:02.927133 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.33.100 (ff:ff:ff:ff:ff:ff) tell 192.168.33.100, length 46
```

無事にarpが更新された

[root@SX0000030540-00 ~]# arp
Address HWtype HWaddress Flags Mask Iface
vrrp ether 08:00:27:4a:4f:05 C eth1
```

nat1を復帰させる。vrrpはnat2のままで、ipは奪わない。

このとき、nat1からはmulticast参加通知が飛ぶが、 to_exが飛ぶ。もうnat2がいるのを知っているからか。

01:57:04.199054 IP (tos 0xc0, ttl 1, id 0, offset 0, flags [DF], proto IGMP (2), length 40, options (RA))
192.168.33.101 > 224.0.0.22: igmp v3 report, 1 group record(s) [gaddr 224.0.0.18 to_ex, 0 source(s)]
01:57:04.815047 IP (tos 0xc0, ttl 1, id 0, offset 0, flags [DF], proto IGMP (2), length 40, options (RA))
192.168.33.101 > 224.0.0.22: igmp v3 report, 1 group record(s) [gaddr 224.0.0.18 to_ex, 0 source(s)]
```

その後nat1からもadvertisementが飛び始める。これで相互監視するんだろう。

01:57:07.188432 IP (tos 0xc0, ttl 255, id 1, offset 0, flags [none], proto VRRP (112), length 40)
192.168.33.101 > 224.0.0.18: vrrp 192.168.33.101 > 224.0.0.18: VRRPv2, Advertisement, vrid 51, prio 100, authtype none, intvl 5s, length 20, addrs: 192.168.33.100
```

nat2を止めてvrrpをnat1に移す

nat2から死亡通知が、やっぱり to_in で飛ぶ

01:58:47.308416 IP (tos 0xc0, ttl 1, id 0, offset 0, flags [DF], proto IGMP (2), length 40, options (RA))
192.168.33.102 > 224.0.0.22: igmp v3 report, 1 group record(s) [gaddr 224.0.0.18 to_in, 0 source(s)]
01:58:56.094253 IP (tos 0xc0, ttl 1, id 0, offset 0, flags [DF], proto IGMP (2), length 40, options (RA))
192.168.33.102 > 224.0.0.22: igmp v3 report, 1 group record(s) [gaddr 224.0.0.18 to_in, 0 source(s)]
```

vrrpからのarp request が10sおきになった。nat1からのだ。さっきと間隔違う気がするが・・きにしない

01:58:52.307357 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.33.100 (ff:ff:ff:ff:ff:ff) tell 192.168.33.100, length 46
01:59:02.309689 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.33.100 (ff:ff:ff:ff:ff:ff) tell 192.168.33.100, length 46
01:59:12.310586 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.33.100 (ff:ff:ff:ff:ff:ff) tell 192.168.33.100, length 46
01:59:22.312478 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.33.100 (ff:ff:ff:ff:ff:ff) tell 192.168.33.100, length 46
```

hereのarpは更新されて nat1 にいくようになっていた。

[root@SX0000030540-00 ~]# arp
Address HWtype HWaddress Flags Mask Iface
vrrp ether 08:00:27:29:8f:da C eth1
```

参加・離脱をもっと詳しく見てみる

両方止まった状態から

$ tcpdump -vvvvXX -nni eth1 -n src net 192.168.33.101
```

nat1 start
- 参加通知

02:06:09.336354 IP (tos 0xc0, ttl 1, id 0, offset 0, flags [DF], proto IGMP (2), length 40, options (RA))
192.168.33.101 > 224.0.0.22: igmp v3 report, 1 group record(s) [gaddr 224.0.0.18 to_ex { }]
0x0000: 0100 5e00 0016 0800 2729 8fda 0800 46c0 ..^.....')....F.
0x0010: 0028 0000 4000 0102 21ec c0a8 2165 e000 .(..@...!...!e..
0x0020: 0016 9404 0000 2200 f9eb 0000 0001 0400 ......".........
0x0030: 0000 e000 0012 0000 0000 0000
```

nat1 stop
- 離脱通知

02:07:09.449925 IP (tos 0xc0, ttl 1, id 0, offset 0, flags [DF], proto IGMP (2), length 40, options (RA))
192.168.33.101 > 224.0.0.22: igmp v3 report, 1 group record(s) [gaddr 224.0.0.18 to_in { }]
0x0000: 0100 5e00 0016 0800 2729 8fda 0800 46c0 ..^.....')....F.
0x0010: 0028 0000 4000 0102 21ec c0a8 2165 e000 .(..@...!...!e..
0x0020: 0016 9404 0000 2200 faeb 0000 0001 0300 ......".........
0x0030: 0000 e000 0012 0000 0000 0000 ............
```

diff
- to_inが離脱で、to_exが参加だった。最初は参加で to_in を見ることができたんだけど、両方止めてても最近は to_ex での参加しか見かけない。不思議。

nat1/2を両方stopから順番にstart(multicast参加）してみる

両方共to_inだった。RFC を見ると TO_IN でjoin/leaveができ、to_exではjoinはできるがleaveできないようだ。

https://tools.ietf.org/html/rfc5790

参加のときのto_in/to_exの違いがわからないが、RFCと動きが一致してるのでよしとする。

だいたい理論通りに動いた。

arp cacheがクリアされない場合はこの知識を元にトラブルシュートすればいいだろう。

これはこれでおしまい。

最後に、フェイルオーバーさせるときのslave側のkeepalivedの設定を。（本記事ではフェイルオーバーなしで検証していたのでこれを使うと本記事と違う動きをします）
BACKUP にして priority を下げています。これでフェイルオーバーする。（BACKUPにしてもpriorityが同じだとフェイルオーバーしなかった...そうなんだ..）

! Configuration File for keepalived
global_defs {
   router_id LVS_DEVEL
   vrrp_skip_check_adv_addr
   vrrp_garp_master_refresh 7  # secs, default 0 (no refreshing)
   vrrp_garp_interval 3          # decimal, seconds (resolution usecs). Default 0.
}

vrrp_instance VI_1 {
    state BACKUP
    interface eth1
    virtual_router_id 51
    priority 10
    advert_int 7
    virtual_ipaddress {
        192.168.33.100/24
    }
}

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up