Qiita Teams that are logged in
You are not logged in to any team

Log in to Qiita Team
Community
OrganizationAdvent CalendarQiitadon (β)
Service
Qiita JobsQiita ZineQiita Blog
Help us understand the problem. What is going on with this article?

MySQLでInsert処理をはやくする

More than 1 year has passed since last update.

はじめに

何番煎じ感ありますが、一行ずつInsertする場合とまとめてInsertを比較してみた結果

前提

データベース

DBMS Aurora
バージョン 5.7.12
InstanceType db.r5.2xlarge
vCPU 8
メモリ 64GB

クライアント

InstanceType r5a.2xlarge
Python 2.7.14

事前準備

データ作成

$ perl -e 'for my $num (1..40000){print "${num},name\n";}' > input.csv
$ head input.csv 
1,name
2,name
3,name
4,name
5,name

mysqlに接続してテーブル作成

mysql -u admin -h ******-cluster.cluster-*********.ap-northeast-1.rds.amazonaws.com -p
use mydb
create table emp(
    -> id INT(5),
    -> name VARCHAR(10),
    -> PRIMARY KEY (id));

接続先の指定

$ cat rds_config.py
db_username = "<user>" #ユーザ
db_password = "<password>" #パスワード
db_name = "<mydb>" #作成したDB名を記載

1行づつInsert

import sys
import rds_config
import pymysql
import csv

rds_host  = "******-cluster.cluster-*********.ap-northeast-1.rds.amazonaws.com" #Auoraの接続先エンドポイント
name = rds_config.db_username
password = rds_config.db_password
db_name = rds_config.db_name

conn = pymysql.connect(rds_host, user=name, passwd=password, db=db_name, connect_timeout=5)

def main():
    f = open ("input.csv", "r")
    reader = csv.reader(f)
    cur = conn.cursor()
    for line in reader:
        QUERY = "INSERT INTO emp (id, name) VALUES(" + line[0] + ", " + '"' + line[1] + '"' + ")"
        cur.execute(QUERY)
    conn.commit() 
    cur.close() 
main()

結果

$ time python insert.py 

real    0m11.503s
user    0m2.746s
sys 0m0.329s
$ time python insert.py 

real    0m11.618s
user    0m2.614s
sys 0m0.511s
$ time python insert.py 

real    0m11.229s
user    0m2.888s
sys 0m0.470s

まとめてInsert

$ cat insert_multi.py 
import sys
import rds_config
import pymysql
import csv
#rds settings
rds_host  = "******-cluster.cluster-*********.ap-northeast-1.rds.amazonaws.com" #Auoraの接続先エンドポイント
name = rds_config.db_username
password = rds_config.db_password
db_name = rds_config.db_name

conn = pymysql.connect(rds_host, user=name, passwd=password, db=db_name, connect_timeout=5)
rows = []

def main():
    f = open ("input.csv", "r")
    reader = csv.reader(f)
    cur = conn.cursor()
    for line in reader:
        EmpID = line[0]
        Name  = line[1]
        rows.append((EmpID, Name))

    cur.executemany("INSERT INTO emp (id,name) VALUES (%s,%s)", rows)
    conn.commit() 
    cur.close() 

main()

結果

$ time python insert_multi.py 

real    0m1.028s
user    0m0.726s
sys 0m0.016s
$ time python insert_multi.py 

real    0m1.046s
user    0m0.741s
sys 0m0.028s
$ time python insert_multi.py 

real    0m1.049s
user    0m0.733s
sys 0m0.020s

結論

$ python
Python 2.7.14 (default, Jul 26 2018, 19:59:38) 
[GCC 7.3.1 20180303 (Red Hat 7.3.1-5)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> (1.028+1.046+1.049)/3
1.041
>>> (11.503+11.618+11.229)/3
11.450000000000001
>>> 11.450000000000001/1.041
10.999039385206533

その他

前半2つが一行ずつ
後半3つが一括
image.png

image.png

投稿内容は私個人の意見であり、所属企業・部門見解を代表するものではありません。

atsumjp
Cloud Consultant, Google Cloud Professional Services. All views and opinions are my own. GCPサービスについて試してみた内容について書いていきたいと思います。 GCP<-AWS<-NetApp<-SES
Why not register and get more from Qiita?
  1. We will deliver articles that match you
    By following users and tags, you can catch up information on technical fields that you are interested in as a whole
  2. you can read useful information later efficiently
    By "stocking" the articles you like, you can search right away