More than 5 years have passed since last update.

秘密計算ことはじめ2019

Posted at 2019-11-17

Pyfhel(Python For Homomorphic Encryption Libraries)という完全準同型暗号ライブラリのチートシートです。

Pyfhelを使うと暗号化されたデータを復号せずにそのまま演算することが可能です。

Pyfhelについて

ソースコード
- ibarrond/Pyfhel: PYthon For Homomorphic Encryption Libraries, perform encrypted computations such as sum, mult, scalar product or matrix multiplication in Python, with NumPy compatibility. Uses SEAL/HElib/PALISADE as backends, implemented using Cython.
ドキュメント
- Welcome to Pyfhel’s documentation! — Pyfhel 18/11/2018 documentation

インストール

pip3 install pyfhel

from Pyfhel import Pyfhel, PyCtxt, PyPtxt

Name	Memo
PyPtxt	プレーンテキストデータを格納するためのオブジェクト
PyCtxt	暗号文を格納するのためのオブジェクト
Pyfhel	PyCtxtに対する関数¹などを含むオブジェクト

Pythonのバージョン等は下記の通り

Python (3.4+) & Cython on top of C++17. (REQUIRED: Python must have been compiled with C++17: g++>=6 | clang++>=5.0, Visual Studio 2017.).

インストールは結構時間がかかるので注意。Google Colaboratoryで5分くらいかかる。

鍵の生成

he = Pyfhel()
he.contextGen(p=65537, m=4096)
he.keyGen()

p：Plaintext modulus.　巨大な素数、全ての処理はmod pで行われる
m：Coefficient modulus　m=pow(2,12)などと指定すると良い

鍵の書き出し・読み込み

he.saveContext('context.txt') #コンテキストをファイルに保存
he.savepublicKey('pub.key')  #公開鍵をファイルに保存
he.savesecretKey('secret.key') #秘密鍵をファイルに保存

he.restoreContext('context.txt') #コンテキストをファイルから読み込み
he.restorepublicKey('pub.key')  #公開鍵をファイルから読み込み
he.restoresecretKey('secret.key') #秘密鍵をファイルから読み込み

コンテキストはよくわからないが、おそらくpとmのパラメータに関する情報だと思われる。
暗号化だけならContextとpublicKeyをrestoreすれば十分
復号だけならContextとsecretKeyだけrestoreすれば十分

暗号化・復号

# int型の暗号化・復号
enc_13 = he.encryptInt(13)
dec_13 = he.decryptInt(enc_13)

# float型の暗号化・復号
enc_11dot13 = he.encryptFrac(11.13)
dec_11dot13 = he.decryptFrac(enc_11dot13)

# array型の暗号化・復号（各要素がint型の場合）
he = Pyfhel()
he.contextGen(p=65537, m=4096, flagBatching=True)  #flagBatchingをTrueにする
enc_1113 = he.encryptBatch([1,1,1,3])
dec_1113 = he.decryptBatch(enc_1113)
print(enc_1113, dec_1113[len(enc_1113), len(dec_1113))

# array型の暗号化・復号（各要素がfloat型の場合）
enc = [he.encryptFrac(d) for d in [0.1, 0.1, 0.1, 0.3]]
dec = [he.decryptFrac(d) for d in enc]

array型の復号で出力されるデータはm次元のベクトルとなっている。そのため、元々の入力データと同じものを取り出したい場合は下記の通り。

l = [1,1,1,3]
enc_1113 = he.encryptBatch(l)
dec_1113 = he.decryptBatch(enc_1113)
print(enc_1113, dec_1113[:len(l)],len(dec_1113))

また、float型の暗号化・復号は出力結果に誤差が含まれるため注意が必要である。

enc = [he.encrypt(d) for d in [0.1, 0.1, 0.1, 0.3]]
# [<Pyfhel.PyCtxt.PyCtxt object at 0x7fb7e17e6870>, <Pyfhel.PyCtxt.PyCtxt object at 0x7fb7e04d7ca8>, <Pyfhel.PyCtxt.PyCtxt object at 0x7fb7e043f8b8>, <Pyfhel.PyCtxt.PyCtxt object at 0x7fb7e9082b40>]
dec = [he.decryptFrac(d) for d in enc]
# [0.10000000149011612, 0.10000000149011612, 0.10000000149011612, 0.30000001192092896]

算術演算

同じ公開鍵で暗号化されているデータは暗号化されたまま計算可能である。

enc1 = he.encryptInt(11)
enc2 = he.encryptInt(13)

# そのまま
print(enc1+enc2) # 24
print(enc1-enc2) # -2
print(enc1*enc2) # 143

# Pyfhelクラスのメソッドを使う場合
print(he.add(enc1, enc2, in_new_ctxt=True)) #24
print(he.sub(enc1, enc2, in_new_ctxt=True)) # -2 
print(he.multiply(enc1, enc2, in_new_ctxt=True)) # 143

# 平方数
print(he.decryptInt(he.square(enc1, True))) # 121

# 累乗計算
he.relinKeyGen(30, 5)
print(he.decryptInt(he.power(enc1, 2, True))) #121

# 正負反転
print(he.decryptInt(he.negate(enc1, in_new_ctxt=True))) # -11

in_new_ctxt=Falseの場合はenc1 = enc1 + enc2というようにenc1の値が更新されてしまうので注意。

各種設定確認

print(he.getp()) # pの値
print(he.getm()) # mの値
print(he.getflagBatch()) # Batchモードが有効かどうか 

print(he.getbase()) # 暗号に使う基数
print(he.getnSlots()) # Batchモードの出力データの長さ
print(he.getintDigits()) # int型の空間範囲? int64なら64
print(he.getfracDigits()) # float型の空間範囲？　
print(he.getsec()) # 相当するAES暗号強度

getintDigigtsに関してはよくわからない

ノイズレベルの確認

enc1 = he.encryptInt(1)
print(he.noiseLevel(enc1)) # 82

暗号文のまま計算するとノイズが蓄積していき、ノイズレベルが0になると正しい値が計算されなくなる。

%time
import pandas as pd

he = Pyfhel()           # 空のPyfhelオブジェクトを生成
he.contextGen(p=65537, m=2**13, flagBatching=True)  #コンテキストを初期化 m=8192
he.keyGen()             # 鍵を生成

enc1, enc2 = he.encryptInt(1), he.encryptInt(2)
enc3, enc4 = he.encryptInt(1), he.encryptInt(2)
plain1, plain2 = 1, 2
plain3, plain4 = 1, 2

df_add = pd.DataFrame(index=[],columns=['noiseLevel(add_Int)', 'result(add_Int)', 'grand truth(add_Int)'])
df_mul = pd.DataFrame(index=[],columns=['noiseLevel(mul_Int)', 'result(mul_Int)', 'grand truth(mul_Int)'])

for i in range(10):
    he.add(enc1, enc2) # 加算
    he.multiply(enc3, enc4) # 乗算
    plain1 = plain1 + plain2
    plain3 = plain3 * plain4
    s1 = pd.Series([he.noiseLevel(enc1),  he.decryptInt(enc1), plain1], index=df_add.columns)
    s2 = pd.Series([he.noiseLevel(enc3),  he.decryptInt(enc3), plain3], index=df_mul.columns)
    df_add = df_add.append(s1, ignore_index=True )
    df_mul = df_mul.append(s2, ignore_index=True )
display(df_add)
display(df_add.plot(subplots=True, figsize=(9,9)))
display(df_mul)
display(df_mul.plot(subplots=True, figsize=(9,9)))

Table	Graph
Int型の加算によるノイズ蓄積	左表を可視化したもの
Int型の乗算によるノイズ蓄積	左表を可視化したもの

このように乗算は加算に比べてノイズレベルの低下が激しく、計算結果も大きくずれてしまう。

対策としては下記の２つが挙げられる。

ノイズレベルが0になる前に一旦復号→再暗号化を行う
パラメータmの値を大きくする（大きいほどノイズが溜まりにくいが、計算時間が長くなる）

ちなみに浮動小数点で暗号化してもほぼ同じような結果が出る。

Pandasデータフレームの暗号化

enc_df = pd.DataFrame(data=[],columns=df.columns)
for colname in df:
    enc_df[colname] = df[colname].apply(lambda x: he.encryptFrac(x))

Pandasの算術演算メソッドも暗号化したまま実行できる

# 平均値を暗号化したまま計算する
meanlist = []
for colname in enc_df:
    meanlist.append(enc_df[colname].sum()*he.encryptFrac(1/len(df[colname])))

print(df.mean())
print([he.decryptFrac(x) for x in meanlist])

出力結果

sepal length (cm)    5.843333
sepal width (cm)     3.057333
petal length (cm)    3.758000
petal width (cm)     1.199333
dtype: float64
[5.843333087628707, 3.0573331359773874, 3.757999788969755, 1.1993331399280578]

参考

本記事のノートブックはGitHubにアップしてあります

暗号周り
- クラウドを支えるこれからの暗号技術
- 秘密計算のシステムとその原理
サンプル実装
- HE-Hackathon / he-hackathon-2019 · GitLab

暗号化、復号、および算術演算などを行う ↩

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up