Help us understand the problem. What is going on with this article?

Tips of Routing on the Host

はじめに

FRR でサーバL3化を実践した場合に役に立ちそうなノウハウや注意点を備忘録としてまとめる. 実際のネットワーク環境や使用しているプラットフォームソフトウェアによって状況はかなり異なると思うので参考程度に.
結論を先に記載する.

  • (must) サーバのIPアドレスはloではなくdummyインターフェースに付与する
  • (must) サーバから広報される経路に注意を払うこと. 適切なフィルタリングが必要
  • (option) サーバ上のECMPをdisableにしたい時はLocal Preferenceが使える
  • (option) サーバ上で複数のネットワークを実行する際は ルーティングデーモンで送信元NICを指定する

下準備が長いので Tips から見るのがいいです.

検証環境

  • GNS3 2.2.25
  • スイッチ: Cumulus-VX 3.7.11
  • サーバノード: Ubuntu18.04 cloud-img
  • ホストルーティングデーモン: FRR7.2.1

GNS3 のトポロジは以下. Ubuntu に FRR をインストールした後は NAT を削除してよい.
gns3.png

ネットワーク構成

今回は ECMP に触れるため multipath にする. スイッチのルーティングテーブルは VRF で分割してサブインターフェースを使用して対向とそれぞれ eBGP ピアを IPv6 Unnumbered で張る.

L1図

l1.png

L3図

l3.png

設定

検証用なので必要最小限の設定を入れていく.

Spine

spine-1 パラメータ

spine-1
cat > env.sh <<EOF
export ASN=65000
export HOSTNAME=spine-1
export LOOPBACK=1.1.1.1
export DOWNLINKS=swp1-4
export PRIVATE_DOWNLINKS=swp1.101,swp2.101,swp3.101,swp4.101
export PUBLIC_DOWNLINKS=swp1.201,swp2.201,swp3.201,swp4.201
EOF
source env.sh
echo "source env.sh" >> ~/.bashrc

spine-2 パラメータ

spine-2
cat > env.sh <<EOF
export ASN=65000
export HOSTNAME=spine-2
export LOOPBACK=2.2.2.2
export DOWNLINKS=swp1-4
export PRIVATE_DOWNLINKS=swp1.101,swp2.101,swp3.101,swp4.101
export PUBLIC_DOWNLINKS=swp1.201,swp2.201,swp3.201,swp4.201
EOF
source env.sh
echo "source env.sh" >> ~/.bashrc

spine 設定

NCLU-spine
# Hostname & Loopback
net add hostname $HOSTNAME
net add loopback lo ip address ${LOOPBACK}/32
net add interface ${DOWNLINKS} mtu 9216
net pending
net commit

# PRIVATE: BGP Unnumbered IF
net add vrf PRIVATE
net add interface ${PRIVATE_DOWNLINKS} ipv6 nd ra-interval 5
net add interface ${PRIVATE_DOWNLINKS} ipv6 nd ra-lifetime 15
net add interface ${PRIVATE_DOWNLINKS} mtu 9216
net add interface ${PRIVATE_DOWNLINKS} vrf PRIVATE
net pending
net commit

# PRIVATE: eBGP
net add bgp vrf PRIVATE autonomous-system $ASN
net add bgp vrf PRIVATE router-id $LOOPBACK
net add bgp vrf PRIVATE bestpath as-path multipath-relax
net add bgp vrf PRIVATE bestpath compare-routerid
net add bgp vrf PRIVATE neighbor PRIVATE_LEAF peer-group
net add bgp vrf PRIVATE neighbor PRIVATE_LEAF remote-as external
net add bgp vrf PRIVATE neighbor PRIVATE_LEAF soft-reconfiguration inbound
net add bgp vrf PRIVATE neighbor $PRIVATE_DOWNLINKS interface peer-group PRIVATE_LEAF
net pending
net commit

# PUBLIC: BGP Unnumbered IF
net add vrf PUBLIC
net add interface ${PUBLIC_DOWNLINKS} ipv6 nd ra-interval 5
net add interface ${PUBLIC_DOWNLINKS} ipv6 nd ra-lifetime 15
net add interface ${PUBLIC_DOWNLINKS} mtu 9216
net add interface ${PUBLIC_DOWNLINKS} vrf PUBLIC
net pending
net commit

# PUBLIC: eBGP
net add bgp vrf PUBLIC autonomous-system $ASN
net add bgp vrf PUBLIC router-id $LOOPBACK
net add bgp vrf PUBLIC bestpath as-path multipath-relax
net add bgp vrf PUBLIC bestpath compare-routerid
net add bgp vrf PUBLIC neighbor PUBLIC_LEAF peer-group
net add bgp vrf PUBLIC neighbor PUBLIC_LEAF remote-as external
net add bgp vrf PUBLIC neighbor PUBLIC_LEAF soft-reconfiguration inbound
net add bgp vrf PUBLIC neighbor $PUBLIC_DOWNLINKS interface peer-group PUBLIC_LEAF
net pending
net commit

Leaf

leaf-3 パラメータ

leaf-3
cat > env.sh <<EOF
export HOSTNAME=leaf-3
export LOOPBACK=3.3.3.3
export DOWNLINK=swp1
export UPLINKS=swp31-32
export PRIVATE_DOWNLINK=swp1.101
export PRIVATE_UPLINKS=swp31.101,swp32.101
export PUBLIC_DOWNLINK=swp1.201
export PUBLIC_UPLINKS=swp31.201,swp32.201
export ASN=65001
EOF
source env.sh
echo "source env.sh" >> ~/.bashrc

leaf-4 パラメータ

leaf-4
cat > env.sh <<EOF
export HOSTNAME=leaf-4
export LOOPBACK=4.4.4.4
export DOWNLINK=swp1
export UPLINKS=swp31-32
export PRIVATE_DOWNLINK=swp1.101
export PRIVATE_UPLINKS=swp31.101,swp32.101
export PUBLIC_DOWNLINK=swp1.201
export PUBLIC_UPLINKS=swp31.201,swp32.201
export ASN=65001
EOF
source env.sh
echo "source env.sh" >> ~/.bashrc

leaf-5 パラメータ

leaf-5
cat > env.sh <<EOF
export HOSTNAME=leaf-5
export LOOPBACK=5.5
export DOWNLINK=swp1
export UPLINKS=swp31-32
export PRIVATE_DOWNLINK=swp1.101
export PRIVATE_UPLINKS=swp31.101,swp32.101
export PUBLIC_DOWNLINK=swp1.201
export PUBLIC_UPLINKS=swp31.201,swp32.201
export ASN=65002
EOF
source env.sh
echo "source env.sh" >> ~/.bashrc

leaf-6 パラメータ

leaf-6
cat > env.sh <<EOF
export HOSTNAME=leaf-6
export LOOPBACK=6.6.6.6
export DOWNLINK=swp1
export UPLINKS=swp31-32
export PRIVATE_DOWNLINK=swp1.101
export PRIVATE_UPLINKS=swp31.101,swp32.101
export PUBLIC_DOWNLINK=swp1.201
export PUBLIC_UPLINKS=swp31.201,swp32.201
export ASN=65002
EOF
source env.sh
echo "source env.sh" >> ~/.bashrc

leaf 設定

NCLU-leaf
# Hostname & Loopback
net add hostname $HOSTNAME
net add loopback lo ip address ${LOOPBACK}/32
net add interface $UPLINKS mtu 9216
net add interface $DOWNLINK mtu 9216
net pending
net commit

# PRIVATE: BGP interfaces
net add interface ${PRIVATE_DOWNLINK} ipv6 nd ra-interval 5
net add interface ${PRIVATE_DOWNLINK} ipv6 nd ra-lifetime 15
net add interface ${PRIVATE_DOWNLINK} mtu 9216
net add interface ${PRIVATE_UPLINKS} ipv6 nd ra-interval 5
net add interface ${PRIVATE_UPLINKS} ipv6 nd ra-lifetime 15
net add interface ${PRIVATE_UPLINKS} mtu 9216
net add vrf PRIVATE
net add interface ${PRIVATE_DOWNLINK} vrf PRIVATE
net add interface ${PRIVATE_UPLINKS} vrf PRIVATE
net pending
net commit

# PRIVATE: eBGP
net add bgp vrf PRIVATE autonomous-system $ASN
net add bgp vrf PRIVATE router-id $LOOPBACK
net add bgp vrf PRIVATE bestpath as-path multipath-relax
net add bgp vrf PRIVATE bestpath compare-routerid
net add bgp vrf PRIVATE neighbor NODE peer-group
net add bgp vrf PRIVATE neighbor NODE remote-as external
net add bgp vrf PRIVATE neighbor NODE soft-reconfiguration inbound
net add bgp vrf PRIVATE neighbor $PRIVATE_DOWNLINK interface peer-group NODE
net add bgp vrf PRIVATE neighbor SPINE peer-group
net add bgp vrf PRIVATE neighbor SPINE remote-as external
net add bgp vrf PRIVATE neighbor SPINE soft-reconfiguration inbound
net add bgp vrf PRIVATE neighbor $PRIVATE_UPLINKS interface peer-group SPINE
net pending
net commit

# PUBLIC: BGP interfaces
net add bgp vrf PUBLIC autonomous-system $ASN
net add bgp vrf PUBLIC router-id $LOOPBACK
net add interface ${PUBLIC_DOWNLINK} ipv6 nd ra-interval 5
net add interface ${PUBLIC_DOWNLINK} ipv6 nd ra-lifetime 15
net add interface ${PUBLIC_DOWNLINK} mtu 9216
net add interface ${PUBLIC_UPLINKS} ipv6 nd ra-interval 5
net add interface ${PUBLIC_UPLINKS} ipv6 nd ra-lifetime 15
net add interface ${PUBLIC_UPLINKS} mtu 9216
net add vrf PUBLIC
net add interface ${PUBLIC_DOWNLINK} vrf PUBLIC
net add interface ${PUBLIC_UPLINKS} vrf PUBLIC
net pending
net commit

# PUBLIC: eBGP
net add bgp vrf PUBLIC autonomous-system $ASN
net add bgp vrf PUBLIC router-id $LOOPBACK
net add bgp vrf PUBLIC bestpath as-path multipath-relax
net add bgp vrf PUBLIC bestpath compare-routerid
net add bgp vrf PUBLIC neighbor NODE peer-group
net add bgp vrf PUBLIC neighbor NODE remote-as external
net add bgp vrf PUBLIC neighbor NODE soft-reconfiguration inbound
net add bgp vrf PUBLIC neighbor $PUBLIC_DOWNLINK interface peer-group NODE
net add bgp vrf PUBLIC neighbor SPINE peer-group
net add bgp vrf PUBLIC neighbor SPINE remote-as external
net add bgp vrf PUBLIC neighbor SPINE soft-reconfiguration inbound
net add bgp vrf PUBLIC neighbor $PUBLIC_UPLINKS interface peer-group SPINE
net pending
net commit

# Reboot
sudo reboot

Node

サーバでの作業はすべてrootで行うものとする.

node-7 パラメータ

node-7
sudo su -
cat > /root/env.sh <<EOF
export HOSTNAME=node-7
ASN=65003
LOOPBACK=172.16.7.7
BGP_NIC1=ens3
BGP_NIC2=ens4
DHCP_NIC=ens5
EOF
source /root/env.sh
echo "source env.sh" >> ~/.bashrc

node-8 パラメータ

node-8
sudo su -
cat > /root/env.sh <<EOF
export HOSTNAME=node-8
ASN=65004
LOOPBACK=172.16.8.8
BGP_NIC1=ens3
BGP_NIC2=ens4
DHCP_NIC=ens5
EOF
source /root/env.sh
echo "source env.sh" >> ~/.bashrc

node 設定

BGPのIF設定をipコマンドで柔軟に設定したかったので 、OS起動時のネットワーク設定はsystemdを使ったカスタムスクリプトを用意した.

初期設定

node-configuration
# Hostname
hostnamectl set-hostname $HOSTNAME

# Boot speed
echo "network: {config: disabled}" > /etc/cloud/cloud.cfg.d/99-disable-network-config.cfg

# DHCP for Source NAT
cat > /etc/netplan/config.yaml <<EOF
network:
  version: 2
  renderer: networkd
  ethernets:
    ${DHCP_NIC}:
      dhcp4: yes
      dhcp6: no
EOF
netplan generate
netplan apply

# network script
cat > /etc/roh-network-init.sh <<EOF
#!/bin/bash
# BGP IPv6 Unnumbered
sysctl -w net.ipv6.conf.all.disable_ipv6=0
sysctl -w net.ipv6.conf.default.disable_ipv6=0

# Set Virtual IP address
ip addr add dev lo ${LOOPBACK}/32

# Physical IF
ip link set ${BGP_NIC1} up
ip link set ${BGP_NIC1} mtu 9000
ip link set ${BGP_NIC2} up
ip link set ${BGP_NIC2} mtu 9000

# PRIVATE
ip link add link ${BGP_NIC1} name ${BGP_NIC1}.101 type vlan id 101
ip link set ${BGP_NIC1}.101 up
ip link add link ${BGP_NIC2} name ${BGP_NIC2}.101 type vlan id 101
ip link set ${BGP_NIC2}.101 up
EOF
chmod 755 /etc/roh-network-init.sh
cat /etc/roh-network-init.sh

# systemd service
cat > /etc/systemd/system/roh-network-init.service <<EOF
[Unit]
Description=roh-network-init
After=network.target
[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/etc/roh-network-init.sh
[Install]
WantedBy=multi-user.target
EOF
systemctl enable roh-network-init
systemctl start roh-network-init
systemctl status roh-network-init

# LLDP (networkctl lldp が動かなかったため)
apt install -y lldpd
systemctl start lldpd
systemctl status lldpd
lldpcli show neighbors

# Reboot
reboot

FRR 設定

frr-configuration
# Install FRR
curl -s https://deb.frrouting.org/frr/keys.asc | apt-key add -
echo "deb https://deb.frrouting.org/frr bionic frr-stable" > /etc/apt/sources.list.d/frr.list
apt update && apt install -y frr frr-pythontools

# FRR configuration
cat > /etc/frr/daemons <<EOF
bgpd=yes
vtysh_enable=yes
zebra_options="  -A 127.0.0.1 -s 90000000"
bgpd_options="   -A 127.0.0.1"
EOF
systemctl enable frr
systemctl start frr

# FRR config
cat > /etc/frr/frr.conf <<EOF
frr version 7.2.1
frr defaults traditional
hostname ${HOSTNAME}
log syslog informational
no ip forwarding
no ipv6 forwarding
service integrated-vtysh-config
!
router bgp ${ASN}
 bgp router-id ${LOOPBACK}
 bgp always-compare-med
 bgp bestpath as-path multipath-relax
 bgp bestpath compare-routerid
 neighbor LEAF-PRIVATE peer-group
 neighbor LEAF-PRIVATE remote-as external
 neighbor ${BGP_NIC1}.101 interface peer-group LEAF-PRIVATE
 neighbor ${BGP_NIC2}.101 interface peer-group LEAF-PRIVATE
 !
 address-family ipv4 unicast
  network ${LOOPBACK}/32
  neighbor LEAF-PRIVATE soft-reconfiguration inbound
 exit-address-family
!
line vty
!
EOF
systemctl restart frr

確認

GNS3での配線が間違っているとトラブルシューティングが大変なのでLLDPで配線をしっかり確認する. この時点で対向nodeへの経路がECMPになっており ping 疎通が通ればOK.

leaf-3
net show bgp vrf PRIVATE summary
# show bgp vrf PRIVATE ipv4 unicast summary
# =========================================
# BGP router identifier 3.3.3.3, local AS number 65001 vrf-id 59
# ...
# Neighbor           V         AS MsgRcvd MsgSent   TblVer  InQ OutQ  Up/Down State/PfxRcd
# node-7(swp1.101)   4      65003    3834    3835        0    0    0 02:03:59            1
# spine-1(swp31.101) 4      65000    3836    3837        0    0    0 02:03:58            1
# spine-2(swp32.101) 4      65000    3838    3837        0    0    0 02:03:59            1
node-7
ip route
# 172.16.8.8 proto bgp metric 20
#        nexthop via 169.254.0.1 dev ens4.101 weight 1 onlink
#        nexthop via 169.254.0.1 dev ens3.101 weight 1 onlink

ping 172.16.8.8
# PING 172.16.8.8 (172.16.8.8) 56(84) bytes of data.
# 64 bytes from 172.16.8.8: icmp_seq=1 ttl=61 time=5.46 ms

Tips

ここからが本題になる.

サーバのIPアドレスはdummyIFに付与する

上の設定ではloにサーバIPアドレスを設定したが、動作するミドルウェアによってはloに設定されたアドレスを無視することがわかったので、仮想IFを作成する. dummyIFの使い方はこちらの記事で知りました.

node
# Delete IP address from lo
ip addr del dev lo ${LOOPBACK}/32

# Add dummy IF
ip link add vni0 type dummy
ip link set vni0 up
ip link set vni0 mtu 9000
sleep 1 # たまに空振るのでsleepを入れる
ip addr add dev vni0 ${LOOPBACK}/32
node-7
root@node-7:~# ip a show dev vni0
#8: vni0: <BROADCAST,NOARP,UP,LOWER_UP> mtu 9000 qdisc noqueue state UNKNOWN group default qlen 1000
#    link/ether ca:99:f1:15:96:83 brd ff:ff:ff:ff:ff:ff
#    inet 172.16.7.7/32 scope global vni0
#       valid_lft forever preferred_lft forever
#    inet6 fe80::c899:f1ff:fe15:9683/64 scope link
#       valid_lft forever preferred_lft forever

サーバから広報される経路に注意を払う

FRRに入ってサーバから出ている経路情報を確認すると、自分のIPアドレス以外にスイッチから受け取った経路(172.16.8.8)をそのまま広報してしまっている. 今回のAS番号の構成においては、スイッチはループを防ぐために自身のAS番号が入った経路を受け取った際は破棄するので問題ない. しかしスケーラビリティの観点からサーバ側で不要な経路を広報させないのが望ましい.

node-7
vtysh -c "show bgp ipv4 unicast neighbors ens3.101 advertised-routes"
# ...
#   Network          Next Hop            Metric LocPrf Weight Path
# *> 172.16.7.7/32    0.0.0.0                  0         32768 i
# *> 172.16.8.8/32    ::                                     0 65001 65000 65002 65004 i
# Total number of prefixes 2

対策は色々なやり方があるが、今回はOUT方向にroute-mapを適用する.

node
cat > /etc/frr/frr.conf <<EOF
frr version 7.2.1
frr defaults traditional
hostname ${HOSTNAME}
log syslog informational
no ip forwarding
no ipv6 forwarding
service integrated-vtysh-config
!
router bgp ${ASN}
 bgp router-id ${LOOPBACK}
 bgp always-compare-med
 bgp bestpath as-path multipath-relax
 bgp bestpath compare-routerid
 neighbor LEAF-PRIVATE peer-group
 neighbor LEAF-PRIVATE remote-as external
 neighbor ${BGP_NIC1}.101 interface peer-group LEAF-PRIVATE
 neighbor ${BGP_NIC2}.101 interface peer-group LEAF-PRIVATE
 !
 address-family ipv4 unicast
  network ${LOOPBACK}/32
  neighbor LEAF-PRIVATE soft-reconfiguration inbound
  neighbor LEAF-PRIVATE route-map PRIVATE_SERVER_OUT out
 exit-address-family
!
bgp as-path access-list PATH_LOCAL_ORIGIN permit ^$
!
route-map PRIVATE_SERVER_OUT permit 10
 match as-path PATH_LOCAL_ORIGIN
!
line vty
!
EOF
systemctl restart frr

自身のサーバIPアドレスのみ広報されているのを確認.

node-7
vtysh -c "show bgp ipv4 unicast neighbors ens3.101 advertised-routes"
# ...
#   Network          Next Hop            Metric LocPrf Weight Path
# *> 172.16.7.7/32    0.0.0.0                  0         32768 i
# Total number of prefixes 1

サーバ上の ECMP を disable にする方法

Linux上のネットワーク系のミドルウェアによってはECMPを正しく認識できないケースがあったり、トラブルシューティングの切り分けで意図的に片方のパスだけ使いたい場合がある. as-path prependというやり方があるが、今回はスイッチのネットワークに影響を与えずに、サーバ内だけで完結できるLocal Preferenceをおすすめする.

node
cat > /etc/frr/frr.conf <<EOF
frr version 7.2.1
frr defaults traditional
hostname ${HOSTNAME}
log syslog informational
no ip forwarding
no ipv6 forwarding
service integrated-vtysh-config
!
router bgp ${ASN}
 bgp router-id ${LOOPBACK}
 bgp always-compare-med
 bgp bestpath as-path multipath-relax
 bgp bestpath compare-routerid
 neighbor LEAF-PRIVATE peer-group
 neighbor LEAF-PRIVATE remote-as external
 neighbor ${BGP_NIC1}.101 interface peer-group LEAF-PRIVATE
 neighbor ${BGP_NIC2}.101 interface peer-group LEAF-PRIVATE
 !
 address-family ipv4 unicast
  network ${LOOPBACK}/32
  neighbor LEAF-PRIVATE soft-reconfiguration inbound
  neighbor LEAF-PRIVATE route-map PRIVATE_SERVER_IN in
  neighbor LEAF-PRIVATE route-map PRIVATE_SERVER_OUT out
 exit-address-family
!
bgp as-path access-list PATH_LOCAL_ORIGIN permit ^$
!
route-map PRIVATE_SERVER_IN permit 5
 match interface ens3.101
 set local-preference 200
!
route-map PRIVATE_SERVER_IN permit 10
!
route-map PRIVATE_SERVER_OUT permit 10
 match as-path PATH_LOCAL_ORIGIN
!
line vty
!
EOF
systemctl restart frr

パケットの受信はスイッチからECMPで両方のIFにやってくるので、両方のIFで受け取れるようrp_filterをOFFにしておく.

node
sysctl -w net.ipv4.conf.all.rp_filter=0
sysctl -w net.ipv4.conf.default.rp_filter=0
sysctl -w net.ipv4.conf.ens3/101.rp_filter=0
sysctl -w net.ipv4.conf.ens4/101.rp_filter=0
sysctl -p

サーバがパケットを送信する際のECMPがoffになっているか確認する. 受信も片方のパスに寄せたい場合はスイッチ側でas-path prependするのが早い(今回は省略).

node-7
# Kernel FIB の確認 (経路が一つ)
ip route
# 172.16.8.8 via 169.254.0.1 dev ens3.101 proto bgp metric 20 onlink

# BGP table で local-preference値で single-path になっていることを確認
vtysh -c "show bgp ipv4"
# ...
#    Network          Next Hop            Metric LocPrf Weight Path
# *> 172.16.7.7/32    0.0.0.0                  0         32768 i
# *> 172.16.8.8/32    ens3.101                      200      0 65001 65000 65002 65004 i
# *                   ens4.101                               0 65001 65000 65002 65004 i

# 対向ノードへのping確認
ping -c2 172.16.8.8
# PING 172.16.8.8 (172.16.8.8) 56(84) bytes of data.
# 64 bytes from 172.16.8.8: icmp_seq=1 ttl=61 time=4.61 ms

マルチネットワーク対応

サーバL3化において複数の異なるネットワークに対応させてみる. サーバに直接グローバルIPを付与するときなどに使用する.
l3 public.png

サーバ上で動作するアプリケーションは VRF-aware ではないことが多いため、サーバ内ではVRFを使用しないことにする. スイッチのPUBLICVRFとBGPピア接続するためのサブインターフェースと、広報するIPを付与する仮想IFを用意する.

node
# node-7
export PUBLIC_VRF_IPADDR=10.0.7.7

# node-8
export PUBLIC_VRF_IPADDR=10.0.8.8
node
# BGP IF for PUBLIC
ip link add link ${BGP_NIC1} name ${BGP_NIC1}.201 type vlan id 201
ip link set ${BGP_NIC1}.201 up
ip link add link ${BGP_NIC2} name ${BGP_NIC2}.201 type vlan id 201
ip link set ${BGP_NIC2}.201 up

# Virtual IF for PUBLIC
ip link add vni1 type dummy
ip link set vni1 up
ip link set vni1 mtu 9000
sleep 1
ip addr add dev vni1 ${PUBLIC_VRF_IPADDR}/32

# Check
ip addr show vni1
ip addr show ${BGP_NIC1}.201
ip addr show ${BGP_NIC2}.201

次にFRRの設定を更新する.
VRF でルーティングテーブルが分割されていないため route-map を使用してピア先のVRFに合わせて不要な経路情報が流れないようにフィルタリングする. またカーネルがそれぞれの宛先に向けて通信しようとした際に、複数の仮想IFがあると、どれを送信元IFにすればいいかわからなくなってしまう。route-map IN で経路を受け取った際に tag を付けて、カーネルに経路をインストールする時に対応した仮想IFを送信元としてセットするようにする.

node
cat > /etc/frr/frr.conf <<EOF
frr version 7.2.1
frr defaults traditional
hostname ${HOSTNAME}
log syslog informational
no ip forwarding
no ipv6 forwarding
service integrated-vtysh-config
!
router bgp ${ASN}
 bgp router-id ${LOOPBACK}
 bgp always-compare-med
 bgp bestpath as-path multipath-relax
 bgp bestpath compare-routerid
 neighbor LEAF-PRIVATE peer-group
 neighbor LEAF-PRIVATE remote-as external
 neighbor LEAF-PUBLIC peer-group
 neighbor LEAF-PUBLIC remote-as external
 neighbor ${BGP_NIC1}.101 interface peer-group LEAF-PRIVATE
 neighbor ${BGP_NIC2}.101 interface peer-group LEAF-PRIVATE
 neighbor ${BGP_NIC1}.201 interface peer-group LEAF-PUBLIC
 neighbor ${BGP_NIC2}.201 interface peer-group LEAF-PUBLIC
 !
 address-family ipv4 unicast
  network ${LOOPBACK}/32
  network ${PUBLIC_VRF_IPADDR}/32
  neighbor LEAF-PRIVATE soft-reconfiguration inbound
  neighbor LEAF-PRIVATE route-map PRIVATE_SERVER_IN in
  neighbor LEAF-PRIVATE route-map PRIVATE_SERVER_OUT out
  neighbor LEAF-PUBLIC soft-reconfiguration inbound
  neighbor LEAF-PUBLIC route-map PUBLIC_SERVER_IN in
  neighbor LEAF-PUBLIC route-map PUBLIC_SERVER_OUT out
 exit-address-family
!
ip prefix-list FILTER_PRIVATE_OUT seq 10 permit 172.16.0.0/16 le 32
ip prefix-list FILTER_PUBLIC_OUT seq 10 permit 10.0.0.0/8 le 32
!
bgp as-path access-list PATH_LOCAL_ORIGIN permit ^$
!
route-map EXPORT_KERNEL permit 10
 match tag 101
 set src ${LOOPBACK}
!
route-map EXPORT_KERNEL permit 20
 match tag 201
 set src ${PUBLIC_VRF_IPADDR}
!
route-map PRIVATE_SERVER_IN permit 10
 set tag 101
!
route-map PRIVATE_SERVER_OUT permit 10
 match as-path PATH_LOCAL_ORIGIN
 match ip address prefix-list FILTER_PRIVATE_OUT
!
route-map PUBLIC_SERVER_IN permit 10
 set tag 201
!
route-map PUBLIC_SERVER_OUT permit 10
 match as-path PATH_LOCAL_ORIGIN
 match ip address prefix-list FILTER_PUBLIC_OUT
!
ip protocol bgp route-map EXPORT_KERNEL
!
line vty
!
EOF
systemctl restart frr

これによりnode-7のカーネルのルーティングテーブルは経路に対してsrcが設定された状態になる.

node-7
ip route
# 10.0.8.8 proto bgp src 10.0.7.7 metric 20     # <--- src IP が 10.0.7.7(vni1)
#       nexthop via 169.254.0.1 dev ens4.201 weight 1 onlink
#       nexthop via 169.254.0.1 dev ens3.201 weight 1 onlink
# 172.16.8.8 proto bgp src 172.16.7.7 metric 20 # <--- src IP が 172.16.7.7(vni0)
#       nexthop via 169.254.0.1 dev ens4.101 weight 1 onlink
#       nexthop via 169.254.0.1 dev ens3.101 weight 1 onlink

不要な経路を対向スイッチのVRFに流していないか念の為確認する.

node-7
# PRIVATE (広報する経路は自身のIPアドレスのみ)
vtysh -c "show bgp ipv4 unicast neighbors ens3.101 advertised-routes"
# ...
#    Network          Next Hop            Metric LocPrf Weight Path
# *> 172.16.7.7/32    0.0.0.0                  0         32768 i
# Total number of prefixes 1

# PUBLIC (広報する経路は自身のIPアドレスのみ)
vtysh -c "show bgp ipv4 unicast neighbors ens4.201 advertised-routes"
# ...
#    Network          Next Hop            Metric LocPrf Weight Path
# *> 10.0.7.7/32      0.0.0.0                  0         32768 i
# Total number of prefixes 1

# 疎通確認(PRIVATE)
ping 172.16.8.8
# PING 172.16.8.8 (172.16.8.8) 56(84) bytes of data.
# 64 bytes from 172.16.8.8: icmp_seq=1 ttl=61 time=6.49 ms

# 疎通確認(PUBLIC)
ping 10.0.8.8
# PING 10.0.8.8 (10.0.8.8) 56(84) bytes of data.
# 64 bytes from 10.0.8.8: icmp_seq=1 ttl=61 time=6.87 ms

おわりに

サーバL3化の際の注意点やTipsをまとめた. 私の例ではあるが、サーバ側だけの知識だと問題点に気づけなかったり対策もわからなかったので、BGPに詳しいネットワークエンジニアのサポートが必要だった. 別の問題が発生したとしてもroute-mapや BGP のメトリックを駆使すればたいていの問題は解決できそうに見える. FRRはCISCOとほぼ同様のconfigなのでネットワークエンジニアの協力も得やすくネット上にも情報が多い. 同僚のネットワークエンジニアに感謝する.

tom7
Why not register and get more from Qiita?
  1. We will deliver articles that match you
    By following users and tags, you can catch up information on technical fields that you are interested in as a whole
  2. you can read useful information later efficiently
    By "stocking" the articles you like, you can search right away