More than 5 years have passed since last update.

pybind11入門(3) NumPy連携その１

Last updated at 2019-03-06Posted at 2019-02-25

その1 https://qiita.com/lucidfrontier45/items/872c38b5eec08e3800d7
その2 https://qiita.com/lucidfrontier45/items/c7cc409a3af962d100a3

pybind11を使ってNumPyと連携する方法を試した。まずは直接ndarrayを触る方法。

`pybind11::array`型

pybind11にはpybind11::bufferとpybind11::arrayのに種類のクラスが定義されており、これらを介してndarrayを扱うことができる。前者はndarrayに限らない一般的なPythonのbufferプロトコルと連携できるものであり、ここではndarray専用のarray型を試してみる。なお、array_t<T>というtemplateも用意されており、実際にはこちらを用いる。いつもどおり、cmakeを用いる。

CMakeLists.txt

cmake_minimum_required(VERSION 3.2)
project(pybind_test VERSION 0.1.0)
set(CMAKE_POSITION_INDEPENDENT_CODE ON)
add_subdirectory(pybind11)
add_subdirectory(fmt)
pybind11_add_module(mymodule module.cpp)
target_link_libraries(mymodule PRIVATE fmt::fmt)

fmtという書式フォーマットのライブラリも使用する。
https://github.com/fmtlib/fmt

`ndarray`へのReadOnlyアクセス

２次元配列の各要素を表示するだけのプログラムで動作を確認してみる。

module.cpp

#include <iostream>
#include <pybind11/pybind11.h>
#include <pybind11/numpy.h>
#include <fmt/core.h>

namespace py = pybind11;

template <typename T>
void print_array(py::array_t<T> x)
{
    const auto &buff_info = x.request();
    const auto &shape = buff_info.shape;
    std::cout << "C++" << std::endl;
    for (auto i = 0; i < shape[0]; i++)
    {
        for (auto j = 0; j < shape[1]; j++)
        {
            auto v = *x.data(i, j);
            std::cout << fmt::format("x[{}, {}] = {}", i, j, v) << std::endl;
        }
    }
}

PYBIND11_MODULE(mymodule, m)
{
    m.doc() = "my test module";
    m.def("print_array", &print_array<int32_t>, "");
    m.def("print_array", &print_array<double>, "");
}

ndarrayを扱う場合にはpybind11/numpy.hをincludeする。requestメソッドでbuffer_info型の変数を得ることがでる。buffer_infoは以下のような構造になっており

struct buffer_info {
    void *ptr;
    ssize_t itemsize;
    std::string format;
    ssize_t ndim;
    std::vector<ssize_t> shape;
    std::vector<ssize_t> strides;
};

ndarrayでおなじみのshapeなどが得られるのでこれを用いでfor loopを回せます。データにアクセスする際にはx.data(i, j)のようにするとその要素へのポインタが得られる。なお、dataメソッドはReadonlyのアクセスなので得られるポインタにはconstがついている。

最後に、この関数はtemplateを使用しているのでpybind11に登録する際には実体化する必要がる。ここでは整数用のint32_tと実数用のdoubleの２種類を作る。

Pythonで実行して確認する。

import numpy as np
import mymodule

x = np.arange(2*3).reshape((2, 3)).astype(np.int32)
print("python")
for i in range(x.shape[0]):
    for j in range(x.shape[1]):
        print("x[{}, {}] = {}".format(i, j, x[i, j]))

mymodule.print_array(x)

>>>
python
x[0, 0] = 0
x[0, 1] = 1
x[0, 2] = 2
x[1, 0] = 3
x[1, 1] = 4
x[1, 2] = 5
C++
x[0, 0] = 0
x[0, 1] = 1
x[0, 2] = 2
x[1, 0] = 3
x[1, 1] = 4
x[1, 2] = 5

ちゃんと同じ内容が表示できた!

配列の変更

.data(i, j)の代わりに.mutable_data(i, j)を用いることでconstでないポインタが得られ、内容を書き換えることができる。以下のプログラムは全配列の要素に定数を足す例である。

template <typename T>
void modify_array_inplace(py::array_t<T> x, T a)
{
    const auto &buff_info = x.request();
    const auto &shape = buff_info.shape;
    for (auto i = 0; i < shape[0]; i++)
    {
        for (auto j = 0; j < shape[1]; j++)
        {
            *x.mutable_data(i, j) += a;
        }
    }
}

PYBIND11_MODULE(mymodule, m)
{
    m.doc() = "my test module";
    m.def("modify_array_inplace", &modify_array_inplace<int32_t>, "");
    m.def("modify_array_inplace", &modify_array_inplace<double>, "");
}

Pythonで試してみる。

x = np.random.randn(2, 3).astype(np.float64)
print("before")
for i in range(x.shape[0]):
    for j in range(x.shape[1]):
        print("x[{}, {}] = {:.6f}".format(i, j, x[i, j]))

mymodule.modify_array_inplace(x, 10.0)
print("after")
for i in range(x.shape[0]):
    for j in range(x.shape[1]):
        print("x[{}, {}] = {:.6f}".format(i, j, x[i, j]))

>> 
before
x[0, 0] = 1.307738
x[0, 1] = -1.625780
x[0, 2] = 0.196201
x[1, 0] = 0.614617
x[1, 1] = -0.175023
x[1, 2] = 1.490781
after
x[0, 0] = 11.307738
x[0, 1] = 8.374220
x[0, 2] = 10.196201
x[1, 0] = 10.614617
x[1, 1] = 9.824977
x[1, 2] = 11.490781

配列の各要素に一律に10.0が足されたのが分かる。

なお、

mymodule.modify_array_inplace(x, 10)

のようにするとxがdoubleの配列なのにintを足していてmodify_array_inplace<int32_t>のほうが呼ばれたらしく、xは型が異なるのでコピーが渡されて配列の中身は変更されなかった。渡す引数がintなのか、floatなのかは注意したほうが良さそう。

array_tを返す関数

array_t型の変数をを新規に作成し、戻り値にすることでPython側に新しいndarrayを返すことができる。

template <typename T>
auto modify_array(py::array_t<T> x, T a)
{   
    const auto &buff_info = x.request();
    const auto &shape = buff_info.shape;
    py::array_t<T> y{shape};
    for (auto i = 0; i < shape[0]; i++)
    {
        for (auto j = 0; j < shape[1]; j++)
        {
            *y.mutable_data(i, j) = *x.data(i, j) + a;
        }
    }
    return y;
}

array_t型のコンストラクタは色々あるが、shapeを渡すものが最も簡単に使用できると思う。Pythonで利用する場合は以下の通り。

x = np.arange(2*3).reshape((2, 3)).astype(np.int32)
x2 = mymodule.modify_array(x, 10)

direct access

余計なチェックを無くしてよりオーバーヘッドが少ないアクセスの仕方もある。@cython.boundscheck(False)的なものだと思う¹。

template <typename T>
void modify_array_direct(py::array_t<T> x, T a)
{
    auto r = x.template mutable_unchecked<2>();
    for (auto i = 0; i < r.shape(0); i++)
    {
        for (auto j = 0; j < r.shape(1); j++)
        {
            r(i, j) += a;
        }
    }
}

公式docではx.mutable_unchecked<2>();となっているがこれだとコンパイルエラー。githubのissueに同様のものがあってx.template mutable_unchecked<2>();とすれば良いらしい²。ちなみにtemplateを使用せず、

void modify_array_direct(py::array_t<double> x, double a){
    auto r = x.mutable_unchecked<2>();
    ...
}

のような場合は問題なかった。

参考

いつもの公式doc: https://pybind11.readthedocs.io/en/stable/advanced/pycpp/numpy.html
uncheckedとtemplateを組み合わせる黒魔術: https://github.com/pybind/pybind11/issues/1412

uncheckedを使用しない方法だとindexの範囲外にアクセスするとPythonのIndexErrorが出たが、uncheckedの場合は特にエラーが出ずにそのまま上書きできた。おそろしやー。 ↩
自分のC++力が低すぎてなぜそれで良いのかよくわからない。。。 ↩

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up