2
6

More than 3 years have passed since last update.

SQLAlchemyのトランザクション処理

Posted at

はじめに

SQLAlchemy を使って insert 処理を実行した際にトランザクションの扱い方で処理時間に結構な差がでました。(※注)
以下、サンプルコードと自身の環境での検証結果ではありますが知見を共有したく思います。

(※注)
本記事のサンプルコードでは insert 処理のみを扱いますが、update 処理でもトランザクションの扱いによって実行時間に差が出ます。

環境

環境 バージョン 備考
macOS Catalina v10.15.7
Docker Desktop for Mac v3.3.3
Docker v20.10.6, build 370c289 $ docker --version
Docker Compose v1.29.1, build c34c88b2 $ docker-compose --version
Docker MySQL v8.0.x Dockerfile で指定
Docker Python v3.9.x 同上
SQLAlchemy v1.4.17 requirement.txt で指定

検証

以下の設定でそれぞれ 10,000 行の insertupdate 処理で検証。

  • autocommit ありでループ内で flush()
  • autocommit ありでループ外で flush()
  • autocommit なしでループ外で commit()

結果から

insert

トランザクション flush() / commit() の場所 結果 備考
1 autocommit あり ループ内で flush() 10,000件 insert するのに約3分 毎行 commit が実行され、下の 2 つに比べて大幅に時間がかかる
2 autocommit あり ループ外で flush() 10,000件 insert するのに約8秒 commit は全行に対して実行され、下記 3 と同じクエリが発行された
3 autocommit なし ループ外で commit() 10,000件 insert するのに約8秒 commit は全行に対して実行され、上記 2 と同じクエリが発行された

コードサンプルとログ

autocommit ありでループ内で flush()

コードサンプル

# セッションの設定(autocommit あり)
Session = sessionmaker(
    autocommit=True,
    class_=MasterSlaveSession
)
# ループ内で flush()
def insert():
    rdb_session = Session()
    for i in range(10000):
        work_num = i + 1
        work_time = jst_time(time.time())
        employee = Employee()
        employee.name = 'employee' + str(work_num).zfill(6)
        employee.phone = str(work_num).zfill(6)
        employee.created_at = work_time
        employee.updated_at = work_time
        rdb_session.add(employee)

        # autocommit ありでここで flush() すると毎行トランザクションが発生するので遅い
        rdb_session.flush()

    # autocommit ありの場合は flush(), なしの場合は commit() するとトランザクションは処理対象行全体にかかるので早い
    # rdb_session.flush()
    # rdb_session.commit()

ログ(実行結果)

##
# 10,000 件処理したときの時間
#  * autocommit あり
#  * loop 内で flush
#
2021-06-08 15:36:16.736761+09:00 process start
process had got 171.57751655578613sec to done.         # 10,000件 insert するのに約3分
2021-06-08 15:39:08.314277+09:00 process had finished

##
# ログ( これは 10件処理したときのログから2件抜粋 )
# 1件ずつトランザクションがかかっている事がわかる
#
2021-06-08 15:46:56,357 INFO sqlalchemy.engine.Engine BEGIN (implicit) ##トランザクション開始
2021-06-08 15:46:56,360 INFO sqlalchemy.engine.Engine INSERT INTO employee (name, phone, created_at, updated_at) VALUES (%(name)s, %(phone)s, %(created_at)s, %(updated_at)s)
2021-06-08 15:46:56,360 INFO sqlalchemy.engine.Engine [generated in 0.00036s] {'name': 'employee000001', 'phone': '000001', 'created_at': datetime.datetime(2021, 6, 8, 15, 46, 56, 341841, tzinfo=datetime.timezone(datetime.timedelta(seconds=32400), 'JST')), 'updated_at': datetime.datetime(2021, 6, 8, 15, 46, 56, 341841, tzinfo=datetime.timezone(datetime.timedelta(seconds=32400), 'JST'))}
2021-06-08 15:46:56,365 INFO sqlalchemy.engine.Engine COMMIT           ##トランザクション終了
2021-06-08 15:46:56,377 INFO sqlalchemy.engine.Engine BEGIN (implicit) ##トランザクション開始
2021-06-08 15:46:56,377 INFO sqlalchemy.engine.Engine INSERT INTO employee (name, phone, created_at, updated_at) VALUES (%(name)s, %(phone)s, %(created_at)s, %(updated_at)s)
2021-06-08 15:46:56,377 INFO sqlalchemy.engine.Engine [cached since 0.01766s ago] {'name': 'employee000002', 'phone': '000002', 'created_at': datetime.datetime(2021, 6, 8, 15, 46, 56, 377034, tzinfo=datetime.timezone(datetime.timedelta(seconds=32400), 'JST')), 'updated_at': datetime.datetime(2021, 6, 8, 15, 46, 56, 377034, tzinfo=datetime.timezone(datetime.timedelta(seconds=32400), 'JST'))}
2021-06-08 15:46:56,379 INFO sqlalchemy.engine.Engine COMMIT           ##トランザクション終了

いちレコード単位でトランザクションが発生されているのがわかる。遅い。

autocommit ありでループ外で flush()

コードサンプル

# セッションの設定(autocommit あり)
Session = sessionmaker(
    autocommit=True,
    class_=MasterSlaveSession
)
# ループ外で flush()
def insert():
    rdb_session = Session()
    for i in range(10000):
        work_num = i + 1
        work_time = jst_time(time.time())
        employee = Employee()
        employee.name = 'employee' + str(work_num).zfill(6)
        employee.phone = str(work_num).zfill(6)
        employee.created_at = work_time
        employee.updated_at = work_time
        rdb_session.add(employee)

        # autocommit ありでここで flush() すると毎行トランザクションが発生するので遅い
        # rdb_session.flush()

    # autocommit ありの場合は flush(), なしの場合は commit() するとトランザクションは処理対象行全体にかかるので早い
    rdb_session.flush()
    # rdb_session.commit()

ログ(実行結果)

##
# 10,000 件処理したときの時間
#  * autocommit あり
#  * loop 外で flush
#
% python sqlalchemy_work.py
2021-06-08 15:34:45.603209+09:00 process start
process had got 8.878259658813477sec to done.         # 10,000件 insert するのに約8秒
2021-06-08 15:34:54.481468+09:00 process had finished

##
# ログ( これは 10件処理したときのログ )
# 10件に対してまとめてトランザクションがかかっている事がわかる
2021-06-08 15:49:04,296 INFO sqlalchemy.engine.Engine BEGIN (implicit) ##トランザクション開始
2021-06-08 15:49:04,298 INFO sqlalchemy.engine.Engine INSERT INTO employee (name, phone, created_at, updated_at) VALUES (%(name)s, %(phone)s, %(created_at)s, %(updated_at)s)
2021-06-08 15:49:04,298 INFO sqlalchemy.engine.Engine [generated in 0.00040s] {'name': 'employee000001', 'phone': '000001', 'created_at': datetime.datetime(2021, 6, 8, 15, 49, 4, 277344, tzinfo=datetime.timezone(datetime.timedelta(seconds=32400), 'JST')), 'updated_at': datetime.datetime(2021, 6, 8, 15, 49, 4, 277344, tzinfo=datetime.timezone(datetime.timedelta(seconds=32400), 'JST'))}
2021-06-08 15:49:04,302 INFO sqlalchemy.engine.Engine INSERT INTO employee (name, phone, created_at, updated_at) VALUES (%(name)s, %(phone)s, %(created_at)s, %(updated_at)s)
2021-06-08 15:49:04,302 INFO sqlalchemy.engine.Engine [cached since 0.004541s ago] {'name': 'employee000002', 'phone': '000002', 'created_at': datetime.datetime(2021, 6, 8, 15, 49, 4, 279356, tzinfo=datetime.timezone(datetime.timedelta(seconds=32400), 'JST')), 'updated_at': datetime.datetime(2021, 6, 8, 15, 49, 4, 279356, tzinfo=datetime.timezone(datetime.timedelta(seconds=32400), 'JST'))}
2021-06-08 15:49:04,305 INFO sqlalchemy.engine.Engine INSERT INTO employee (name, phone, created_at, updated_at) VALUES (%(name)s, %(phone)s, %(created_at)s, %(updated_at)s)
2021-06-08 15:49:04,306 INFO sqlalchemy.engine.Engine [cached since 0.00756s ago] {'name': 'employee000003', 'phone': '000003', 'created_at': datetime.datetime(2021, 6, 8, 15, 49, 4, 279418, tzinfo=datetime.timezone(datetime.timedelta(seconds=32400), 'JST')), 'updated_at': datetime.datetime(2021, 6, 8, 15, 49, 4, 279418, tzinfo=datetime.timezone(datetime.timedelta(seconds=32400), 'JST'))}
2021-06-08 15:49:04,309 INFO sqlalchemy.engine.Engine INSERT INTO employee (name, phone, created_at, updated_at) VALUES (%(name)s, %(phone)s, %(created_at)s, %(updated_at)s)
2021-06-08 15:49:04,309 INFO sqlalchemy.engine.Engine [cached since 0.01123s ago] {'name': 'employee000004', 'phone': '000004', 'created_at': datetime.datetime(2021, 6, 8, 15, 49, 4, 279465, tzinfo=datetime.timezone(datetime.timedelta(seconds=32400), 'JST')), 'updated_at': datetime.datetime(2021, 6, 8, 15, 49, 4, 279465, tzinfo=datetime.timezone(datetime.timedelta(seconds=32400), 'JST'))}
2021-06-08 15:49:04,312 INFO sqlalchemy.engine.Engine INSERT INTO employee (name, phone, created_at, updated_at) VALUES (%(name)s, %(phone)s, %(created_at)s, %(updated_at)s)
2021-06-08 15:49:04,312 INFO sqlalchemy.engine.Engine [cached since 0.01392s ago] {'name': 'employee000005', 'phone': '000005', 'created_at': datetime.datetime(2021, 6, 8, 15, 49, 4, 279509, tzinfo=datetime.timezone(datetime.timedelta(seconds=32400), 'JST')), 'updated_at': datetime.datetime(2021, 6, 8, 15, 49, 4, 279509, tzinfo=datetime.timezone(datetime.timedelta(seconds=32400), 'JST'))}
2021-06-08 15:49:04,314 INFO sqlalchemy.engine.Engine INSERT INTO employee (name, phone, created_at, updated_at) VALUES (%(name)s, %(phone)s, %(created_at)s, %(updated_at)s)
2021-06-08 15:49:04,314 INFO sqlalchemy.engine.Engine [cached since 0.01604s ago] {'name': 'employee000006', 'phone': '000006', 'created_at': datetime.datetime(2021, 6, 8, 15, 49, 4, 279603, tzinfo=datetime.timezone(datetime.timedelta(seconds=32400), 'JST')), 'updated_at': datetime.datetime(2021, 6, 8, 15, 49, 4, 279603, tzinfo=datetime.timezone(datetime.timedelta(seconds=32400), 'JST'))}
2021-06-08 15:49:04,315 INFO sqlalchemy.engine.Engine INSERT INTO employee (name, phone, created_at, updated_at) VALUES (%(name)s, %(phone)s, %(created_at)s, %(updated_at)s)
2021-06-08 15:49:04,316 INFO sqlalchemy.engine.Engine [cached since 0.01768s ago] {'name': 'employee000007', 'phone': '000007', 'created_at': datetime.datetime(2021, 6, 8, 15, 49, 4, 279643, tzinfo=datetime.timezone(datetime.timedelta(seconds=32400), 'JST')), 'updated_at': datetime.datetime(2021, 6, 8, 15, 49, 4, 279643, tzinfo=datetime.timezone(datetime.timedelta(seconds=32400), 'JST'))}
2021-06-08 15:49:04,318 INFO sqlalchemy.engine.Engine INSERT INTO employee (name, phone, created_at, updated_at) VALUES (%(name)s, %(phone)s, %(created_at)s, %(updated_at)s)
2021-06-08 15:49:04,318 INFO sqlalchemy.engine.Engine [cached since 0.01985s ago] {'name': 'employee000008', 'phone': '000008', 'created_at': datetime.datetime(2021, 6, 8, 15, 49, 4, 279680, tzinfo=datetime.timezone(datetime.timedelta(seconds=32400), 'JST')), 'updated_at': datetime.datetime(2021, 6, 8, 15, 49, 4, 279680, tzinfo=datetime.timezone(datetime.timedelta(seconds=32400), 'JST'))}
2021-06-08 15:49:04,319 INFO sqlalchemy.engine.Engine INSERT INTO employee (name, phone, created_at, updated_at) VALUES (%(name)s, %(phone)s, %(created_at)s, %(updated_at)s)
2021-06-08 15:49:04,319 INFO sqlalchemy.engine.Engine [cached since 0.02132s ago] {'name': 'employee000009', 'phone': '000009', 'created_at': datetime.datetime(2021, 6, 8, 15, 49, 4, 279720, tzinfo=datetime.timezone(datetime.timedelta(seconds=32400), 'JST')), 'updated_at': datetime.datetime(2021, 6, 8, 15, 49, 4, 279720, tzinfo=datetime.timezone(datetime.timedelta(seconds=32400), 'JST'))}
2021-06-08 15:49:04,320 INFO sqlalchemy.engine.Engine INSERT INTO employee (name, phone, created_at, updated_at) VALUES (%(name)s, %(phone)s, %(created_at)s, %(updated_at)s)
2021-06-08 15:49:04,321 INFO sqlalchemy.engine.Engine [cached since 0.02268s ago] {'name': 'employee000010', 'phone': '000010', 'created_at': datetime.datetime(2021, 6, 8, 15, 49, 4, 279759, tzinfo=datetime.timezone(datetime.timedelta(seconds=32400), 'JST')), 'updated_at': datetime.datetime(2021, 6, 8, 15, 49, 4, 279759, tzinfo=datetime.timezone(datetime.timedelta(seconds=32400), 'JST'))}
2021-06-08 15:49:04,323 INFO sqlalchemy.engine.Engine COMMIT           ##トランザクション終了

全レコードに対してトランザクションが発生していることがわかる。早い。

autocommit なしでループ外で commit()

コードサンプル

# セッションの設定(autocommit なし)
Session = sessionmaker(
    # autocommit=True,
    class_=MasterSlaveSession
)
# ループ外で commit()
def insert():
    rdb_session = Session()
    for i in range(10000):
        work_num = i + 1
        work_time = jst_time(time.time())
        employee = Employee()
        employee.name = 'employee' + str(work_num).zfill(6)
        employee.phone = str(work_num).zfill(6)
        employee.created_at = work_time
        employee.updated_at = work_time
        rdb_session.add(employee)

        # autocommit ありでここで flush() すると毎行トランザクションが発生するので遅い
        # rdb_session.flush()

    # autocommit ありの場合は flush(), なしの場合は commit() するとトランザクションは処理対象行全体にかかるので早い
    # rdb_session.flush()
    rdb_session.commit()

ログ(実行結果)

##
# 10,000 件処理したときの時間
#  * autocommit なし
#  * loop 外で commit
#
% python sqlalchemy_work.py
2021-06-08 15:31:14.633875+09:00 process start
process had got 8.680613994598389sec to done.         # 10,000件 insert するのに約8秒
2021-06-08 15:31:23.314489+09:00 process had finished

##
# ログ( これは 10件処理したときのログ )
# 10件に対してまとめてトランザクションがかかっている事がわかる
2021-06-08 15:52:46,799 INFO sqlalchemy.engine.Engine BEGIN (implicit) ##トランザクション開始
2021-06-08 15:52:46,801 INFO sqlalchemy.engine.Engine INSERT INTO employee (name, phone, created_at, updated_at) VALUES (%(name)s, %(phone)s, %(created_at)s, %(updated_at)s)
2021-06-08 15:52:46,801 INFO sqlalchemy.engine.Engine [generated in 0.00026s] {'name': 'employee000001', 'phone': '000001', 'created_at': datetime.datetime(2021, 6, 8, 15, 52, 46, 780246, tzinfo=datetime.timezone(datetime.timedelta(seconds=32400), 'JST')), 'updated_at': datetime.datetime(2021, 6, 8, 15, 52, 46, 780246, tzinfo=datetime.timezone(datetime.timedelta(seconds=32400), 'JST'))}
2021-06-08 15:52:46,804 INFO sqlalchemy.engine.Engine INSERT INTO employee (name, phone, created_at, updated_at) VALUES (%(name)s, %(phone)s, %(created_at)s, %(updated_at)s)
2021-06-08 15:52:46,804 INFO sqlalchemy.engine.Engine [cached since 0.003321s ago] {'name': 'employee000002', 'phone': '000002', 'created_at': datetime.datetime(2021, 6, 8, 15, 52, 46, 782529, tzinfo=datetime.timezone(datetime.timedelta(seconds=32400), 'JST')), 'updated_at': datetime.datetime(2021, 6, 8, 15, 52, 46, 782529, tzinfo=datetime.timezone(datetime.timedelta(seconds=32400), 'JST'))}
2021-06-08 15:52:46,806 INFO sqlalchemy.engine.Engine INSERT INTO employee (name, phone, created_at, updated_at) VALUES (%(name)s, %(phone)s, %(created_at)s, %(updated_at)s)
2021-06-08 15:52:46,806 INFO sqlalchemy.engine.Engine [cached since 0.00499s ago] {'name': 'employee000003', 'phone': '000003', 'created_at': datetime.datetime(2021, 6, 8, 15, 52, 46, 782593, tzinfo=datetime.timezone(datetime.timedelta(seconds=32400), 'JST')), 'updated_at': datetime.datetime(2021, 6, 8, 15, 52, 46, 782593, tzinfo=datetime.timezone(datetime.timedelta(seconds=32400), 'JST'))}
2021-06-08 15:52:46,807 INFO sqlalchemy.engine.Engine INSERT INTO employee (name, phone, created_at, updated_at) VALUES (%(name)s, %(phone)s, %(created_at)s, %(updated_at)s)
2021-06-08 15:52:46,807 INFO sqlalchemy.engine.Engine [cached since 0.006349s ago] {'name': 'employee000004', 'phone': '000004', 'created_at': datetime.datetime(2021, 6, 8, 15, 52, 46, 782633, tzinfo=datetime.timezone(datetime.timedelta(seconds=32400), 'JST')), 'updated_at': datetime.datetime(2021, 6, 8, 15, 52, 46, 782633, tzinfo=datetime.timezone(datetime.timedelta(seconds=32400), 'JST'))}
2021-06-08 15:52:46,809 INFO sqlalchemy.engine.Engine INSERT INTO employee (name, phone, created_at, updated_at) VALUES (%(name)s, %(phone)s, %(created_at)s, %(updated_at)s)
2021-06-08 15:52:46,809 INFO sqlalchemy.engine.Engine [cached since 0.008276s ago] {'name': 'employee000005', 'phone': '000005', 'created_at': datetime.datetime(2021, 6, 8, 15, 52, 46, 782673, tzinfo=datetime.timezone(datetime.timedelta(seconds=32400), 'JST')), 'updated_at': datetime.datetime(2021, 6, 8, 15, 52, 46, 782673, tzinfo=datetime.timezone(datetime.timedelta(seconds=32400), 'JST'))}
2021-06-08 15:52:46,812 INFO sqlalchemy.engine.Engine INSERT INTO employee (name, phone, created_at, updated_at) VALUES (%(name)s, %(phone)s, %(created_at)s, %(updated_at)s)
2021-06-08 15:52:46,813 INFO sqlalchemy.engine.Engine [cached since 0.01162s ago] {'name': 'employee000006', 'phone': '000006', 'created_at': datetime.datetime(2021, 6, 8, 15, 52, 46, 782711, tzinfo=datetime.timezone(datetime.timedelta(seconds=32400), 'JST')), 'updated_at': datetime.datetime(2021, 6, 8, 15, 52, 46, 782711, tzinfo=datetime.timezone(datetime.timedelta(seconds=32400), 'JST'))}
2021-06-08 15:52:46,814 INFO sqlalchemy.engine.Engine INSERT INTO employee (name, phone, created_at, updated_at) VALUES (%(name)s, %(phone)s, %(created_at)s, %(updated_at)s)
2021-06-08 15:52:46,814 INFO sqlalchemy.engine.Engine [cached since 0.01341s ago] {'name': 'employee000007', 'phone': '000007', 'created_at': datetime.datetime(2021, 6, 8, 15, 52, 46, 782786, tzinfo=datetime.timezone(datetime.timedelta(seconds=32400), 'JST')), 'updated_at': datetime.datetime(2021, 6, 8, 15, 52, 46, 782786, tzinfo=datetime.timezone(datetime.timedelta(seconds=32400), 'JST'))}
2021-06-08 15:52:46,816 INFO sqlalchemy.engine.Engine INSERT INTO employee (name, phone, created_at, updated_at) VALUES (%(name)s, %(phone)s, %(created_at)s, %(updated_at)s)
2021-06-08 15:52:46,816 INFO sqlalchemy.engine.Engine [cached since 0.01495s ago] {'name': 'employee000008', 'phone': '000008', 'created_at': datetime.datetime(2021, 6, 8, 15, 52, 46, 782848, tzinfo=datetime.timezone(datetime.timedelta(seconds=32400), 'JST')), 'updated_at': datetime.datetime(2021, 6, 8, 15, 52, 46, 782848, tzinfo=datetime.timezone(datetime.timedelta(seconds=32400), 'JST'))}
2021-06-08 15:52:46,817 INFO sqlalchemy.engine.Engine INSERT INTO employee (name, phone, created_at, updated_at) VALUES (%(name)s, %(phone)s, %(created_at)s, %(updated_at)s)
2021-06-08 15:52:46,818 INFO sqlalchemy.engine.Engine [cached since 0.01665s ago] {'name': 'employee000009', 'phone': '000009', 'created_at': datetime.datetime(2021, 6, 8, 15, 52, 46, 782887, tzinfo=datetime.timezone(datetime.timedelta(seconds=32400), 'JST')), 'updated_at': datetime.datetime(2021, 6, 8, 15, 52, 46, 782887, tzinfo=datetime.timezone(datetime.timedelta(seconds=32400), 'JST'))}
2021-06-08 15:52:46,819 INFO sqlalchemy.engine.Engine INSERT INTO employee (name, phone, created_at, updated_at) VALUES (%(name)s, %(phone)s, %(created_at)s, %(updated_at)s)
2021-06-08 15:52:46,820 INFO sqlalchemy.engine.Engine [cached since 0.01851s ago] {'name': 'employee000010', 'phone': '000010', 'created_at': datetime.datetime(2021, 6, 8, 15, 52, 46, 782928, tzinfo=datetime.timezone(datetime.timedelta(seconds=32400), 'JST')), 'updated_at': datetime.datetime(2021, 6, 8, 15, 52, 46, 782928, tzinfo=datetime.timezone(datetime.timedelta(seconds=32400), 'JST'))}
2021-06-08 15:52:46,823 INFO sqlalchemy.engine.Engine COMMIT           ##トランザクション終了

こちらも同じ。
全レコードに対してトランザクションが発生していることがわかる。早い。

補足

autocommit モードは非推奨 -> 廃止に

autocommit モードは v1.4.x で非推奨、v2.0.x からは廃止になるそうです。

Deprecated since version 1.4: “autocommit” mode is a legacy mode of use and should not be considered for new projects. The feature will be deprecated in SQLAlchemy 1.4 and removed in version 2.0; both versions provide a more refined “autobegin” approach that allows the Session.begin() method to be used normally. If autocommit mode is used, it is strongly advised that the application at least ensure that transaction scope is made present via the Session.begin() method, rather than using the session in pure autocommit mode.

バルク処理

今回はトランザクションによって処理時間の短縮に努めましたが、

といった並列処理もあります。

これらの学習を進め、今後機会があればはそちらも積極的に使っていきたいと思います。

参考

ソースコード

本記事であげたコードは こちら にあげております。
ご興味あればご覧ください。
( 上記には冒頭で触れた update 処理のサンプルコードもあります )

2
6
1

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
2
6