100Gbps 帯域幅を計測するにはテクニックが必要です。
Fasterdata: Throughput Tool Comparision より〜
Based on our experience, we recommend the following:
・Use iperf2 for parallel streams, bidirectional, or MS Windows-based tests
・Use nuttcp for high-speed UDP testing
・Use iperf3 otherwise.
これらのツールはそれぞれ機能が若干異なり、アーキテクチャも若干異なるため、1 つのツールに必要なものがすべて揃っているとは思わないでください。複数のツールに精通し、特定のユースケースに適したツールを使用するのが最善です。
重要な違いの 1 つは、ツールがシングルスレッドかマルチスレッドかということです。並列ストリームのパフォーマンスをテストする場合は、iperf2などのマルチスレッド ツールを使用する必要があります。
ということで、iperf2 はマルチスレッドなので、iperf3よりも簡単に高速な帯域幅を計測できそうです。前回作成した 100 Gbps NICを搭載した
Oracle Cloud Infrastructure (OCI) Compute BM.Standard.E5を2台用意して、計測してみてみます。
■ 構成図
■ iperf2 インストール
iperf の古いバージョンにはバグがあったことに注意してください。
さらに新しいバージョンには多くの優れた機能がありますので、必ず新しいリリースの iperf2 を使用してください。
● iperf2 Download
1) dnf search
リポジトリにパッケージがあるか検索し、無い場合は Downloadします。
[root@bm-e5 ~]# dnf search iperf
This system is receiving updates from OSMS server.
Last metadata expiration check: 0:01:29 ago on Mon 09 Sep 2024 05:55:49 AM GMT.
=================================================================== Name Matched: iperf ===================================================================
iperf3.x86_64 : Measurement tool for TCP/UDP bandwidth performance
iperf3.i686 : Measurement tool for TCP/UDP bandwidth performance
2) iperf2 Download
iperf2 は iperf の名前になっているので、 iperfで検索してDownloadします。
Oracle Linux の場合は、EPLからダウンロードできます。
・Oracle Linux 9 (x86_64) EPEL
・Oracle Linux 8 (x86_64) EPEL
● iperf2 install
1) iperf2 install
dnf install で iperfのURLを指定もしくは、ダウンロードしたパッケージを指定して Install
[root@bm-e5 ~]# dnf install https://yum.oracle.com/repo/OracleLinux/OL9/developer/EPEL/x86_64/getPackage/iperf-2.1.6-2.el9.x86_64.rpm
Last metadata expiration check: 3:15:28 ago on Sun 08 Sep 2024 09:09:02 PM GMT.
iperf-2.1.6-2.el9.x86_64.rpm 3.2 MB/s | 146 kB 00:00
Dependencies resolved.
========================================================================================================================
Package Architecture Version Repository Size
========================================================================================================================
Installing:
iperf x86_64 2.1.6-2.el9 @commandline 146 k
Transaction Summary
========================================================================================================================
Install 1 Package
Total size: 146 k
Installed size: 302 k
Is this ok [y/N]: y
Downloading Packages:
Running transaction check
Transaction check succeeded.
Running transaction test
Transaction test succeeded.
Running transaction
Preparing : 1/1
Installing : iperf-2.1.6-2.el9.x86_64 1/1
Running scriptlet: iperf-2.1.6-2.el9.x86_64 1/1
Verifying : iperf-2.1.6-2.el9.x86_64 1/1
Installed:
iperf-2.1.6-2.el9.x86_64
Complete!
2) iperf2 install確認
[root@bm-e5 ~]# iperf -v
iperf version 2.1.6 (10 December 2021) pthreads
[root@bm-e5 ~]# iperf -h
Usage: iperf [-s|-c host] [options]
iperf [-h|--help] [-v|--version]
Client/Server:
-b, --bandwidth #[kmgKMG | pps] bandwidth to read/send at in bits/sec or packets/sec
-e, --enhanced use enhanced reporting giving more tcp/udp and traffic information
-f, --format [kmgKMG] format to report: Kbits, Mbits, KBytes, MBytes
--hide-ips hide ip addresses and host names within outputs
-i, --interval # seconds between periodic bandwidth reports
-l, --len #[kmKM] length of buffer in bytes to read or write (Defaults: TCP=128K, v4 UDP=1470, v6 UDP=1450)
-m, --print_mss print TCP maximum segment size (MTU - TCP/IP header)
-o, --output <filename> output the report or error message to this specified file
-p, --port # client/server port to listen/send on and to connect
--permit-key permit key to be used to verify client and server (TCP only)
--sum-only output sum only reports
-u, --udp use UDP rather than TCP
-w, --window #[KM] TCP window size (socket buffer size)
-z, --realtime request realtime scheduler
-B, --bind <host>[:<port>][%<dev>] bind to <host>, ip addr (including multicast address) and optional port and device
-C, --compatibility for use with older versions does not sent extra msgs
-M, --mss # set TCP maximum segment size (MTU - 40 bytes)
-N, --nodelay set TCP no delay, disabling Nagle's Algorithm
-S, --tos # set the socket's IP_TOS (byte) field
-Z, --tcp-congestion <algo> set TCP congestion control algorithm (Linux only)
Server specific:
-p, --port #[-#] server port(s) to listen on/connect to
-s, --server run in server mode
-1, --singleclient run one server at a time
--histograms enable latency histograms
--permit-key-timeout set the timeout for a permit key in seconds
--tcp-rx-window-clamp set the TCP receive window clamp size in bytes
--tap-dev #[<dev>] use TAP device to receive at L2 layer
-t, --time # time in seconds to listen for new connections as well as to receive traffic (default not set)
--udp-histogram #,# enable UDP latency histogram(s) with bin width and count, e.g. 1,1000=1(ms),1000(bins)
-B, --bind <ip>[%<dev>] bind to multicast address and optional device
-U, --single_udp run in single threaded UDP mode
--sum-dstip sum traffic threads based upon destination ip address (default is src ip)
-D, --daemon run the server as a daemon
-V, --ipv6_domain Enable IPv6 reception by setting the domain and socket to AF_INET6 (Can receive on both IPv4 and IPv6)
Client specific:
-c, --client <host> run in client mode, connecting to <host>
--connect-only run a connect only test
--connect-retries # number of times to retry tcp connect
-d, --dualtest Do a bidirectional test simultaneously (multiple sockets)
--fq-rate #[kmgKMG] bandwidth to socket pacing
--full-duplex run full duplex test using same socket
--ipg set the the interpacket gap (milliseconds) for packets within an isochronous frame
--isochronous <frames-per-second>:<mean>,<stddev> send traffic in bursts (frames - emulate video traffic)
--incr-dstip Increment the destination ip with parallel (-P) traffic threads
--incr-dstport Increment the destination port with parallel (-P) traffic threads
--incr-srcip Increment the source ip with parallel (-P) traffic threads
--incr-srcport Increment the source port with parallel (-P) traffic threads
--local-only Set don't route on socket
--near-congestion=[w] Use a weighted write delay per the sampled TCP RTT (experimental)
--no-connect-sync No sychronization after connect when -P or parallel traffic threads
--no-udp-fin No final server to client stats at end of UDP test
-n, --num #[kmgKMG] number of bytes to transmit (instead of -t)
-r, --tradeoff Do a fullduplexectional test individually
--tcp-write-prefetch set the socket's TCP_NOTSENT_LOWAT value in bytes and use event based writes
-t, --time # time in seconds to transmit for (default 10 secs)
--trip-times enable end to end measurements (requires client and server clock sync)
--txdelay-time time in seconds to hold back after connect and before first write
--txstart-time unix epoch time to schedule first write and start traffic
-B, --bind [<ip> | <ip:port>] bind ip (and optional port) from which to source traffic
-F, --fileinput <name> input the data to be transmitted from a file
-H, --ssm-host <ip> set the SSM source, use with -B for (S,G)
-I, --stdin input the data to be transmitted from stdin
-L, --listenport # port to receive fullduplexectional tests back on
-P, --parallel # number of parallel client threads to run
-R, --reverse reverse the test (client receives, server sends)
-S, --tos IP DSCP or tos settings
-T, --ttl # time-to-live, for multicast (default 1)
-V, --ipv6_domain Set the domain to IPv6 (send packets over IPv6)
-X, --peer-detect perform server version detection and version exchange
Miscellaneous:
-x, --reportexclude [CDMSV] exclude C(connection) D(data) M(multicast) S(settings) V(server) reports
-y, --reportstyle C report as a Comma-Separated Values
-h, --help print this message and quit
-v, --version print version information and quit
[kmgKMG] Indicates options that support a k,m,g,K,M or G suffix
Lowercase format characters are 10^3 based and uppercase are 2^n based
(e.g. 1k = 1000, 1K = 1024, 1m = 1,000,000 and 1M = 1,048,576)
The TCP window size option can be set by the environment variable
TCP_WINDOW_SIZE. Most other options can be set by an environment variable
IPERF_<long option name>, such as IPERF_BANDWIDTH.
Source at <http://sourceforge.net/projects/iperf2/>
Report bugs to <iperf-users@lists.sourceforge.net>
■ iperf2 帯域計測
● Firewall 設定
iperf2 はデフォルト 5001 Portを使用するので Firewallを設定している場合はPort解放します。
ここでは、Firewalldが設定されているので次のコマンドを実行
[root@bm-e5 ~]# firewall-cmd --add-port=5001/tcp
● カーネルのネットワーク設定
Client と Server側両方に 帯域幅でるようにTCP送信/受信バッファ・サイズを
帯域幅遅延積をしようして調整する必要があります。
1) カーネル・パラメータ設定
今回はこの値で実施します。実検証では帯域幅遅延積の求め方から計算して値を決めます。
# sudo sysctl -w net.ipv4.tcp_wmem="4096 16384 16777216"
# sudo sysctl -w net.ipv4.tcp_rmem="4096 131072 16777216"
# sudo sysctl -w net.core.wmem_max=16777216
# sudo sysctl -w net.core.rmem_max=16777216
永続設定する場合は、 /etc/sysctl.conf へ設定し、sysctl -p
で設定値を反映させます。
2) 設定確認
# sysctl -a | egrep "wmem|rmem"
net.core.rmem_default = 212992
net.core.rmem_max = 16777216
net.core.wmem_default = 212992
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 131072 16777216
net.ipv4.tcp_wmem = 4096 16384 16777216
● iperf2実行
iperf2 は、 ServerとClientでコマンドを実行して通信して計測します。
次の例は、Client側で 送信帯域幅と並列スレッド数を指定して、最大帯域が出るよう調整します。
Server: iperf -s
Client: iperf -c <Servcer IP Address> -b <Bite Size> -P <並列度>
1) Server側: iperf2実行
[root@bm-e5-server ~]# iperf -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 128 KByte (default)
------------------------------------------------------------
2) Client側: iperf2実行
-b 帯域幅 と -P 並列数 を調整して 100Gbp出るよう設定します。
今回は、9Gbpsを11並列に設定して実行します。
[root@bm-e5-client ~]$ iperf -c 10.10.0.22 -b 9G -P 11
[ 10] local 10.10.0.23 port 38784 connected with 10.10.0.22 port 5001
[ 7] local 10.10.0.23 port 38776 connected with 10.10.0.22 port 5001
[ 2] local 10.10.0.23 port 38750 connected with 10.10.0.22 port 5001
------------------------------------------------------------
Client connecting to 10.10.0.22, TCP port 5001
TCP window size: 325 KByte (default)
------------------------------------------------------------
[ 1] local 10.10.0.23 port 38732 connected with 10.10.0.22 port 5001
[ 5] local 10.10.0.23 port 38762 connected with 10.10.0.22 port 5001
[ 6] local 10.10.0.23 port 38728 connected with 10.10.0.22 port 5001
[ 8] local 10.10.0.23 port 38774 connected with 10.10.0.22 port 5001
[ 9] local 10.10.0.23 port 38786 connected with 10.10.0.22 port 5001
[ 4] local 10.10.0.23 port 38754 connected with 10.10.0.22 port 5001
[ 11] local 10.10.0.23 port 38798 connected with 10.10.0.22 port 5001
[ 3] local 10.10.0.23 port 38748 connected with 10.10.0.22 port 5001
[ ID] Interval Transfer Bandwidth
[ 2] 0.00-10.01 sec 11.1 GBytes 9.51 Gbits/sec
[ 7] 0.00-10.01 sec 11.2 GBytes 9.64 Gbits/sec
[ 5] 0.00-10.01 sec 9.20 GBytes 7.89 Gbits/sec
[ 8] 0.00-10.01 sec 10.8 GBytes 9.23 Gbits/sec
[ 1] 0.00-10.01 sec 10.7 GBytes 9.14 Gbits/sec
[ 3] 0.00-10.01 sec 11.2 GBytes 9.59 Gbits/sec
[ 10] 0.00-10.01 sec 10.8 GBytes 9.28 Gbits/sec
[ 9] 0.00-10.01 sec 10.4 GBytes 8.88 Gbits/sec
[ 4] 0.00-10.01 sec 10.7 GBytes 9.14 Gbits/sec
[ 11] 0.00-10.01 sec 8.66 GBytes 7.43 Gbits/sec
[ 6] 0.00-10.01 sec 10.2 GBytes 8.74 Gbits/sec
[SUM] 0.00-10.00 sec 115 GBytes 98.6 Gbits/sec
[ CT] final connect times (min/avg/max/stdev) = 0.219/0.319/0.443/0.067 ms (tot/err) = 11/0
■ 参考
・Fasterdata: Throughput Tool Comparision
・iperf2 / iperf3
・iperf3 FAQ
・iPerf - The ultimate speed test tool for TCP, UDP and SCTP
・NUTTCP
・帯域幅遅延積の求め方