はじめに
何番煎じ感ありますが、一行ずつInsertする場合とまとめてInsertを比較してみた結果
前提
データベース
DBMS Aurora
バージョン 5.7.12
InstanceType db.r5.2xlarge
vCPU 8
メモリ 64GB
クライアント
InstanceType r5a.2xlarge
Python 2.7.14
事前準備
データ作成
$ perl -e 'for my $num (1..40000){print "${num},name\n";}' > input.csv
$ head input.csv
1,name
2,name
3,name
4,name
5,name
mysqlに接続してテーブル作成
mysql -u admin -h ******-cluster.cluster-*********.ap-northeast-1.rds.amazonaws.com -p
use mydb
create table emp(
-> id INT(5),
-> name VARCHAR(10),
-> PRIMARY KEY (id));
接続先の指定
$ cat rds_config.py
db_username = "<user>" #ユーザ
db_password = "<password>" #パスワード
db_name = "<mydb>" #作成したDB名を記載
1行づつInsert
import sys
import rds_config
import pymysql
import csv
rds_host = "******-cluster.cluster-*********.ap-northeast-1.rds.amazonaws.com" #Auoraの接続先エンドポイント
name = rds_config.db_username
password = rds_config.db_password
db_name = rds_config.db_name
conn = pymysql.connect(rds_host, user=name, passwd=password, db=db_name, connect_timeout=5)
def main():
f = open ("input.csv", "r")
reader = csv.reader(f)
cur = conn.cursor()
for line in reader:
QUERY = "INSERT INTO emp (id, name) VALUES(" + line[0] + ", " + '"' + line[1] + '"' + ")"
cur.execute(QUERY)
conn.commit()
cur.close()
main()
結果
$ time python insert.py
real 0m11.503s
user 0m2.746s
sys 0m0.329s
$ time python insert.py
real 0m11.618s
user 0m2.614s
sys 0m0.511s
$ time python insert.py
real 0m11.229s
user 0m2.888s
sys 0m0.470s
まとめてInsert
$ cat insert_multi.py
import sys
import rds_config
import pymysql
import csv
# rds settings
rds_host = "******-cluster.cluster-*********.ap-northeast-1.rds.amazonaws.com" #Auoraの接続先エンドポイント
name = rds_config.db_username
password = rds_config.db_password
db_name = rds_config.db_name
conn = pymysql.connect(rds_host, user=name, passwd=password, db=db_name, connect_timeout=5)
rows = []
def main():
f = open ("input.csv", "r")
reader = csv.reader(f)
cur = conn.cursor()
for line in reader:
EmpID = line[0]
Name = line[1]
rows.append((EmpID, Name))
cur.executemany("INSERT INTO emp (id,name) VALUES (%s,%s)", rows)
conn.commit()
cur.close()
main()
結果
$ time python insert_multi.py
real 0m1.028s
user 0m0.726s
sys 0m0.016s
$ time python insert_multi.py
real 0m1.046s
user 0m0.741s
sys 0m0.028s
$ time python insert_multi.py
real 0m1.049s
user 0m0.733s
sys 0m0.020s
結論
$ python
Python 2.7.14 (default, Jul 26 2018, 19:59:38)
[GCC 7.3.1 20180303 (Red Hat 7.3.1-5)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> (1.028+1.046+1.049)/3
1.041
>>> (11.503+11.618+11.229)/3
11.450000000000001
>>> 11.450000000000001/1.041
10.999039385206533