ndarray クレートと NumPy の速度比較

Last updated at 2022-12-27Posted at 2022-12-27

本記事は Rust Advent Calendar 2022 シリーズ 2 の 24 日目の記事です。が、クリスマスイブには間に合いませんでした、、
ですがカレンダーを空けるのもアレなので、面の皮を厚くして投稿します。
前日の記事は uzuna 様の『RustでGStreamer Pluginを書く』です。
翌日の記事は投稿時点で Kenta11 様が投稿予定となっております。

動機

Rust を用いた開発に関する私自身のリハビリです。
とはいえ ndarray と NumPy での実装の速度がどのくらい違うのか興味がありましたので比較してみました。
ndarray を利用される人にとって参考になれば幸いです。

比較方法の詳細

比較対象

Module ndarray::doc::ndarray_for_numpy_usersに例示されている API の一部を対象とします。もしかしたら個人的によく使う処理も追加していくかもしれません。
コードはこちらのGitHubリポジトリに上げております。

条件

計測回数

各処理を $1000$ 回実行しました。

配列のサイズ

$N = 1024$ とします。

1 次元配列を用いる場合、その長さは $N^2$ としました。
2 次元配列を用いる場合、その形状は $(N, N)$ としました。

`dot` 処理でのサイズ調整

1 次元配列と 2 次元配列の間で dot 処理を行う場合、1 次元配列の前から $1024$ 個を取り出して処理を適用しました。

処理速度の計算方法

Rust

以下のマクロを作成しました。参考：Rustで実行時間の計測 - Qiita

macro_rules! measure {
    ($x:expr, $y: tt) => {{
        let mut elapsed: Vec<f64> = Vec::new();
        for _ in (0..$y) {
            let start = Instant::now();
            let _1 = $x;
            let end = start.elapsed();
            elapsed.push(
                end.as_secs() as f64
                    + end.subsec_micros() as f64 * 1E-6
                    + end.subsec_nanos() as f64 * 1E-9,
            );
        }
        let elapsed: Array1<f64> = Array::from(elapsed);
        let mean_msec: f64 = elapsed.mean().unwrap() * 1000.;
        let std_msec: f64 = elapsed.std(0.) * 1000.;
        println!(
            "averaged process time over {} times: {:.6} +/- {:.6} msec.",
            $y, mean_msec, std_msec
        );
        vec![mean_msec, std_msec]
    }};
}

Python

大半の処理速度の計測には以下のコードを利用しました。一部の処理については以下のコードをベースに修正したものを使用しています。

def measure(func: callable, ntimes: int, *args, **kwargs):
    """measure process time of a function
    """
    elapsed = []
    for _ in range(ntimes):
        st = time.time()
        _ = func(*args, **kwargs)
        elapsed.append(time.time() - st)
    print("averaged process time over {0} times: {1:.6f} +/- {2:.6f} msec.".format(
        ntimes, np.mean(elapsed) * 1e3, np.std(elapsed) * 1e3
    ))
    return [np.mean(elapsed) * 1e3, np.std(elapsed) * 1e3]

計算環境

GitHub の Codespaces を使用しました。構成は以下の通りです。

Machine specification:

Item	Descrpition / Version
Codespaces
Configuration	2-core, 4 GB RAM, 32 GB storage
OS	Ubuntu 20.04.5 LTS (Focal Fossa)
CPU	Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
Rust
rustup	1.25.1
rustc	1.65.0
cargo	1.65.0
dependency	see Cargo.toml.
Python
Python	3.10.4
NumPy	1.23.5

結果

処理時間の計測結果を下表に示します。計測回数内の平均値を示しています。全体的に ndarray の方が時間がかかる傾向にあります。型変換（type conversion）に関しては、ndarray というよりは Rust の性質上互いに変換するのにひと手間加える必要があることもあり、Python に比べると時間がかかるようです。

NumPy	ndarray	NumPy の処理時間 (msec)	ndarray 系の処理時間 (msec)
Array creation
np.arange(0., 10., 10./N**2)	Array::range(0., 10., 10./N**2)	0.691	1.588
np.linspace(0., 10., N**2)	Array::linspace(0., 10., N**2)	1.540	1.194
np.ones((N, N))	Array::ones((N, N))	0.370	0.753
np.zeros((N, N))	Array::zeros((N, N))	0.350	0.658
np.full((N, N), 7.)	Array::from_elem((N, N), 7.)	0.363	0.754
np.eye(N)	Array::eye(N)	0.354	0.677
Randomize
np.random.normal	ndarray_rand::rand_distr::Normal	27.507	23.694
np.random.poisson	ndarray_rand::rand_distr::Poission	106.423	194.021
np.random.uniform	ndarray_rand::rand_distr::Uniform	8.464	22.766
Mathematics
mat1.dot(mat2)	mat1.dot(&mat2)	35.549	131.120
mat.dot(vec)	mat.dot(&vec)	0.427	0.815
vec.dot(mat)	vec.dot(&mat)	0.377	6.439
vec1.dot(vec2)	vec1.dot(&vec2)	0.889	1.383
a + b	a * b, a + b, etc.	1.547	2.923
a**3	a.mapv(\|a\| a.powi(3))	25.734	1.609
np.sqrt(a)	a.mapv(f64::sqrt)	0.966	2.045
(a>0.5)	a.mapv(\|a\| a > 0.5)	0.398	1.257
a.sum()	a.sum()	0.383	0.681
a.sum(axis=2)	a.sum_axis(Axis(2))	0.370	0.683
a.mean()	a.mean().unwrap()	0.368	0.685
a.mean(axis=2)	a.mean_axis(Axis(2))	0.428	0.693
np.allclose(a, b, atol=1e-8)	a.abs_diff_eq(&b, 1e-8)	16.017	6.11E-05
np.diag(a)	a.diag()	2.99E-03	3.40E-05
Array manipulation
a[:] = 3.	a.fill(3.)	0.399	0.801
a[:] = b	a.assign(&b)	0.737	1.388
np.concatenate((a,b), axis=1)	concatenate![Axis(1), a, b] or concatenate(Axis(1), &[a.view(), b.view()])	2.499	19.819
np.stack((a,b), axis=1)	stack![Axis(1), a, b] or stack(Axis(1), vec![a.view(), b.view()])	2.453	5.302
np.expand_dims(a, axis=1)	a.insert_axis(Axis(1))	4.56E-03	6.52E-05
a.transpose()	a.reversed_axes()	3.48E-04	3.23E-05
a.flatten()	Array::from_iter(a.iter().cloned())	0.683	4.562
Type conversion
a.astype(np.float32)	a.mapv(\|x\| f32::from(x))	0.219	0.629
a.astype(np.int32)	a.mapv(\|x\| i32::from(x))	0.275	0.504
a.astype(np.uint8)	a.mapv(\|x\| u8::try_from(x).unwrap())	0.096	1.524
a.astype(np.int32)	a.mapv(\|x\| x as i32)	0.343	3.125

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up