LoginSignup
1
0

More than 3 years have passed since last update.

MySQLでInsert処理をはやくする

Last updated at Posted at 2019-06-10

はじめに

何番煎じ感ありますが、一行ずつInsertする場合とまとめてInsertを比較してみた結果

前提

データベース

DBMS Aurora
バージョン 5.7.12
InstanceType db.r5.2xlarge
vCPU 8
メモリ 64GB

クライアント

InstanceType r5a.2xlarge
Python 2.7.14

事前準備

データ作成

$ perl -e 'for my $num (1..40000){print "${num},name\n";}' > input.csv
$ head input.csv 
1,name
2,name
3,name
4,name
5,name

mysqlに接続してテーブル作成

mysql -u admin -h ******-cluster.cluster-*********.ap-northeast-1.rds.amazonaws.com -p
use mydb
create table emp(
    -> id INT(5),
    -> name VARCHAR(10),
    -> PRIMARY KEY (id));

接続先の指定

$ cat rds_config.py
db_username = "<user>" #ユーザ
db_password = "<password>" #パスワード
db_name = "<mydb>" #作成したDB名を記載

1行づつInsert

import sys
import rds_config
import pymysql
import csv

rds_host  = "******-cluster.cluster-*********.ap-northeast-1.rds.amazonaws.com" #Auoraの接続先エンドポイント
name = rds_config.db_username
password = rds_config.db_password
db_name = rds_config.db_name

conn = pymysql.connect(rds_host, user=name, passwd=password, db=db_name, connect_timeout=5)

def main():
    f = open ("input.csv", "r")
    reader = csv.reader(f)
    cur = conn.cursor()
    for line in reader:
        QUERY = "INSERT INTO emp (id, name) VALUES(" + line[0] + ", " + '"' + line[1] + '"' + ")"
        cur.execute(QUERY)
    conn.commit() 
    cur.close() 
main()

結果

$ time python insert.py 

real    0m11.503s
user    0m2.746s
sys 0m0.329s
$ time python insert.py 

real    0m11.618s
user    0m2.614s
sys 0m0.511s
$ time python insert.py 

real    0m11.229s
user    0m2.888s
sys 0m0.470s

まとめてInsert

$ cat insert_multi.py 
import sys
import rds_config
import pymysql
import csv
#rds settings
rds_host  = "******-cluster.cluster-*********.ap-northeast-1.rds.amazonaws.com" #Auoraの接続先エンドポイント
name = rds_config.db_username
password = rds_config.db_password
db_name = rds_config.db_name

conn = pymysql.connect(rds_host, user=name, passwd=password, db=db_name, connect_timeout=5)
rows = []

def main():
    f = open ("input.csv", "r")
    reader = csv.reader(f)
    cur = conn.cursor()
    for line in reader:
        EmpID = line[0]
        Name  = line[1]
        rows.append((EmpID, Name))

    cur.executemany("INSERT INTO emp (id,name) VALUES (%s,%s)", rows)
    conn.commit() 
    cur.close() 

main()

結果

$ time python insert_multi.py 

real    0m1.028s
user    0m0.726s
sys 0m0.016s
$ time python insert_multi.py 

real    0m1.046s
user    0m0.741s
sys 0m0.028s
$ time python insert_multi.py 

real    0m1.049s
user    0m0.733s
sys 0m0.020s

結論

$ python
Python 2.7.14 (default, Jul 26 2018, 19:59:38) 
[GCC 7.3.1 20180303 (Red Hat 7.3.1-5)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> (1.028+1.046+1.049)/3
1.041
>>> (11.503+11.618+11.229)/3
11.450000000000001
>>> 11.450000000000001/1.041
10.999039385206533

その他

前半2つが一行ずつ
後半3つが一括
image.png

image.png

投稿内容は私個人の意見であり、所属企業・部門見解を代表するものではありません。

1
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
1
0