0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

参照を利用したPyarrow.Tableの構築

Last updated at Posted at 2025-09-15

背景

PythonでEthernet通信の受信プログラムを作っているときに、メモリの再確保による断片化が原因で、Raspberry Pi Zero2などのメインメモリが少ないマシンでは長時間安定稼働できなかった時の対策メモ

結論

C言語互換の配列(ctypesの配列やmultiprocessing.RawArray)を起点にpyarrow.Buffer, pyarrow.Array, pyarrow.Tableと順に構築すると 起点の配列を書き換えればメモリ再確保なしにPyarrow.Tableの内容を変更できる

コードの例

boolean・整数・浮動小数点・文字列の4列で4行のテーブルを構築する

import ctypes
import multiprocessing  # 後でダブルバッファリングするためこれを採用
import pyarrow

array_length = 4
bit_length = (array_length >> 3) + (1 if array_length & 0x7 else 0)

# 1列目:boolean
bna = multiprocessing.RawArray("B", bit_length)  # valid value flags 1bit = 1value
bnb = pyarrow.py_buffer(bna)
boa = multiprocessing.RawArray("B", bit_length)  # value 1bit = 1value
bob = pyarrow.py_buffer(boa)
a0 = pyarrow.BooleanArray.from_buffers(pyarrow.bool_(), array_length, [bnb, bob])

# 2列目:int16
ba = multiprocessing.RawArray("B", bit_length)  # valid value flags 1bit = 1value
bb = pyarrow.py_buffer(ba)
ia = multiprocessing.RawArray("h", array_length)  # values
ib = pyarrow.py_buffer(ia)
a1 = pyarrow.Int16Array.from_buffers(pyarrow.int16(), array_length, [bb, ib])

# 3列目:float32
fa = multiprocessing.RawArray("f", array_length)  # values
fb = pyarrow.py_buffer(fa)
a2 = pyarrow.FloatArray.from_buffers(pyarrow.float32(), array_length, [None, fb])

# 4列目:String
si = multiprocessing.RawArray("i", array_length + 1)  # 先頭に0が必要
sib = pyarrow.py_buffer(si)
sa = multiprocessing.RawArray("B", array_length * 16)  # 適当な長さを確保
sab = pyarrow.py_buffer(sa)
a3 = pyarrow.StringArray.from_buffers(array_length, sib, sab)

# arrayからTableを構築
t = pyarrow.Table.from_arrays([a0, a1, a2, a3], ["col0", "col1", "col2", "col3"])

# table構築後に値を上書き
bna[0] = 0x3  # Valid Value Flag [True, True, False, False]
boa[0] = 0x6  # Value [False, True, True, False]

ctypes.memset(ba, 0xFF, bit_length)
for i in range(array_length):
    ia[i] = i
for i in range(array_length):
    fa[i] = i * -1.1

v = ["a", "bb", "ccc", "dddd"]
n = 0
si[0] = 0
for i, s in enumerate(v):
    b = v[i].encode()  # UTF-8
    lb = len(b)
    ctypes.memmove(ctypes.byref(sa, n), b, lb)  # バイト列書き込み
    n += lb
    si[i + 1] = n

print(t)  # 全体を出力。sliceすれば行範囲指定で出力可能

出力結果

Mac OS 15.6.1+Python 3.13.7

pyarrow.Table
col0: bool
col1: int16
col2: float
col3: string
----
col0: [[false,true,null,null]]
col1: [[0,1,2,3]]
col2: [[-0,-1.1,-2.2,-3.3]]
col3: [["a","bb","ccc","dddd"]]
0
0
1

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?