search
LoginSignup
22

More than 5 years have passed since last update.

EmbulkでMySQLに大量データを投入してみる - その2

EmbulkでMySQLに大量データを投入してみる - その1 の続き

エラー調査

エラーをみる感じ、一時テーブル(bulk_tbl_*_bl_tmp*)へのデータ投入完了後、実テーブル(bulk_tbl)へマージする途中、なんらかの理由で落ちたっぽい。
そして一時テーブルをDROPしようとしてコネクションがなくなってるのでエラーになっているっぽい。

2015-08-05 08:35:37.456 +0900 [INFO] (transaction): Connecting to jdbc:mysql://localhost:3306/bulk_db options {user=bulk_user, tcpKeepAlive=true, useCompression=true, rewriteBatchedStatements=true, connectTimeout=300000, socketTimeout=2700000}
2015-08-05 08:35:37.672 +0900 [INFO] (transaction): SQL: INSERT INTO `bulk_tbl` (`col01`, `col02`, `col03`, `col04`, `col05`, `col06`, `col07`, `col08`, `col09`, `col10`, `col11`, `col12`, `col13`, `col14`, `col15`, `col16`, `col17`, `col18`, `col19`, `col20`) SELECT `col01`, `col02`, `col03`, `col04`, `col05`, `col06`, `col07`, `col08`, `col09`, `col10`, `col11`, `col12`, `col13`, `col14`, `col15`, `col16`, `col17`, `col18`, `col19`, `col20` FROM `bulk_tbl_0000000055c138a4090f5600_bl_tmp000` UNION ALL SELECT `col01`, `col02`, `col03`, `col04`, `col05`, `col06`, `col07`, `col08`, `col09`, `col10`, `col11`, `col12`, `col13`, `col14`, `col15`, `col16`, `col17`, `col18`, `col19`, `col20` FROM `bulk_tbl_0000000055c138a4090f5600_bl_tmp001` UNION ALL SELECT `col01`, `col02`, `col03`, `col04`, `col05`, `col06`, `col07`, `col08`, `col09`, `col10`, `col11`, `col12`, `col13`, `col14`, `col15`, `col16`, `col17`, `col18`, `col19`, `col20` FROM `bulk_tbl_0000000055c138a4090f5600_bl_tmp002` UNION ALL SELECT `col01`, `col02`, `col03`, `col04`, `col05`, `col06`, `col07`, `col08`, `col09`, `col10`, `col11`, `col12`, `col13`, `col14`, `col15`, `col16`, `col17`, `col18`, `col19`, `col20` FROM `bulk_tbl_0000000055c138a4090f5600_bl_tmp003` UNION ALL SELECT `col01`, `col02`, `col03`, `col04`, `col05`, `col06`, `col07`, `col08`, `col09`, `col10`, `col11`, `col12`, `col13`, `col14`, `col15`, `col16`, `col17`, `col18`, `col19`, `col20` FROM `bulk_tbl_0000000055c138a4090f5600_bl_tmp004` UNION ALL SELECT `col01`, `col02`, `col03`, `col04`, `col05`, `col06`, `col07`, `col08`, `col09`, `col10`, `col11`, `col12`, `col13`, `col14`, `col15`, `col16`, `col17`, `col18`, `col19`, `col20` FROM `bulk_tbl_0000000055c138a4090f5600_bl_tmp005` UNION ALL SELECT `col01`, `col02`, `col03`, `col04`, `col05`, `col06`, `col07`, `col08`, `col09`, `col10`, `col11`, `col12`, `col13`, `col14`, `col15`, `col16`, `col17`, `col18`, `col19`, `col20` FROM `bulk_tbl_0000000055c138a4090f5600_bl_tmp006` UNION ALL SELECT `col01`, `col02`, `col03`, `col04`, `col05`, `col06`, `col07`, `col08`, `col09`, `col10`, `col11`, `col12`, `col13`, `col14`, `col15`, `col16`, `col17`, `col18`, `col19`, `col20` FROM `bulk_tbl_0000000055c138a4090f5600_bl_tmp007` UNION ALL SELECT `col01`, `col02`, `col03`, `col04`, `col05`, `col06`, `col07`, `col08`, `col09`, `col10`, `col11`, `col12`, `col13`, `col14`, `col15`, `col16`, `col17`, `col18`, `col19`, `col20` FROM `bulk_tbl_0000000055c138a4090f5600_bl_tmp008` UNION ALL SELECT `col01`, `col02`, `col03`, `col04`, `col05`, `col06`, `col07`, `col08`, `col09`, `col10`, `col11`, `col12`, `col13`, `col14`, `col15`, `col16`, `col17`, `col18`, `col19`, `col20` FROM `bulk_tbl_0000000055c138a4090f5600_bl_tmp009` UNION ALL SELECT `col01`, `col02`, `col03`, `col04`, `col05`, `col06`, `col07`, `col08`, `col09`, `col10`, `col11`, `col12`, `col13`, `col14`, `col15`, `col16`, `col17`, `col18`, `col19`, `col20` FROM `bulk_tbl_0000000055c138a4090f5600_bl_tmp010` UNION ALL SELECT `col01`, `col02`, `col03`, `col04`, `col05`, `col06`, `col07`, `col08`, `col09`, `col10`, `col11`, `col12`, `col13`, `col14`, `col15`, `col16`, `col17`, `col18`, `col19`, `col20` FROM `bulk_tbl_0000000055c138a4090f5600_bl_tmp011` UNION ALL SELECT `col01`, `col02`, `col03`, `col04`, `col05`, `col06`, `col07`, `col08`, `col09`, `col10`, `col11`, `col12`, `col13`, `col14`, `col15`, `col16`, `col17`, `col18`, `col19`, `col20` FROM `bulk_tbl_0000000055c138a4090f5600_bl_tmp012` UNION ALL SELECT `col01`, `col02`, `col03`, `col04`, `col05`, `col06`, `col07`, `col08`, `col09`, `col10`, `col11`, `col12`, `col13`, `col14`, `col15`, `col16`, `col17`, `col18`, `col19`, `col20` FROM `bulk_tbl_0000000055c138a4090f5600_bl_tmp013` UNION ALL SELECT `col01`, `col02`, `col03`, `col04`, `col05`, `col06`, `col07`, `col08`, `col09`, `col10`, `col11`, `col12`, `col13`, `col14`, `col15`, `col16`, `col17`, `col18`, `col19`, `col20` FROM `bulk_tbl_0000000055c138a4090f5600_bl_tmp014` UNION ALL SELECT `col01`, `col02`, `col03`, `col04`, `col05`, `col06`, `col07`, `col08`, `col09`, `col10`, `col11`, `col12`, `col13`, `col14`, `col15`, `col16`, `col17`, `col18`, `col19`, `col20` FROM `bulk_tbl_0000000055c138a4090f5600_bl_tmp015` UNION ALL SELECT `col01`, `col02`, `col03`, `col04`, `col05`, `col06`, `col07`, `col08`, `col09`, `col10`, `col11`, `col12`, `col13`, `col14`, `col15`, `col16`, `col17`, `col18`, `col19`, `col20` FROM `bulk_tbl_0000000055c138a4090f5600_bl_tmp016` UNION ALL SELECT `col01`, `col02`, `col03`, `col04`, `col05`, `col06`, `col07`, `col08`, `col09`, `col10`, `col11`, `col12`, `col13`, `col14`, `col15`, `col16`, `col17`, `col18`, `col19`, `col20` FROM `bulk_tbl_0000000055c138a4090f5600_bl_tmp017` UNION ALL SELECT `col01`, `col02`, `col03`, `col04`, `col05`, `col06`, `col07`, `col08`, `col09`, `col10`, `col11`, `col12`, `col13`, `col14`, `col15`, `col16`, `col17`, `col18`, `col19`, `col20` FROM `bulk_tbl_0000000055c138a4090f5600_bl_tmp018` UNION ALL SELECT `col01`, `col02`, `col03`, `col04`, `col05`, `col06`, `col07`, `col08`, `col09`, `col10`, `col11`, `col12`, `col13`, `col14`, `col15`, `col16`, `col17`, `col18`, `col19`, `col20` FROM `bulk_tbl_0000000055c138a4090f5600_bl_tmp019` UNION ALL SELECT `col01`, `col02`, `col03`, `col04`, `col05`, `col06`, `col07`, `col08`, `col09`, `col10`, `col11`, `col12`, `col13`, `col14`, `col15`, `col16`, `col17`, `col18`, `col19`, `col20` FROM `bulk_tbl_0000000055c138a4090f5600_bl_tmp020` UNION ALL SELECT `col01`, `col02`, `col03`, `col04`, `col05`, `col06`, `col07`, `col08`, `col09`, `col10`, `col11`, `col12`, `col13`, `col14`, `col15`, `col16`, `col17`, `col18`, `col19`, `col20` FROM `bulk_tbl_0000000055c138a4090f5600_bl_tmp021` UNION ALL SELECT `col01`, `col02`, `col03`, `col04`, `col05`, `col06`, `col07`, `col08`, `col09`, `col10`, `col11`, `col12`, `col13`, `col14`, `col15`, `col16`, `col17`, `col18`, `col19`, `col20` FROM `bulk_tbl_0000000055c138a4090f5600_bl_tmp022` UNION ALL SELECT `col01`, `col02`, `col03`, `col04`, `col05`, `col06`, `col07`, `col08`, `col09`, `col10`, `col11`, `col12`, `col13`, `col14`, `col15`, `col16`, `col17`, `col18`, `col19`, `col20` FROM `bulk_tbl_0000000055c138a4090f5600_bl_tmp023` UNION ALL SELECT `col01`, `col02`, `col03`, `col04`, `col05`, `col06`, `col07`, `col08`, `col09`, `col10`, `col11`, `col12`, `col13`, `col14`, `col15`, `col16`, `col17`, `col18`, `col19`, `col20` FROM `bulk_tbl_0000000055c138a4090f5600_bl_tmp024` UNION ALL SELECT `col01`, `col02`, `col03`, `col04`, `col05`, `col06`, `col07`, `col08`, `col09`, `col10`, `col11`, `col12`, `col13`, `col14`, `col15`, `col16`, `col17`, `col18`, `col19`, `col20` FROM `bulk_tbl_0000000055c138a4090f5600_bl_tmp025` UNION ALL SELECT `col01`, `col02`, `col03`, `col04`, `col05`, `col06`, `col07`, `col08`, `col09`, `col10`, `col11`, `col12`, `col13`, `col14`, `col15`, `col16`, `col17`, `col18`, `col19`, `col20` FROM `bulk_tbl_0000000055c138a4090f5600_bl_tmp026` UNION ALL SELECT `col01`, `col02`, `col03`, `col04`, `col05`, `col06`, `col07`, `col08`, `col09`, `col10`, `col11`, `col12`, `col13`, `col14`, `col15`, `col16`, `col17`, `col18`, `col19`, `col20` FROM `bulk_tbl_0000000055c138a4090f5600_bl_tmp027` UNION ALL SELECT `col01`, `col02`, `col03`, `col04`, `col05`, `col06`, `col07`, `col08`, `col09`, `col10`, `col11`, `col12`, `col13`, `col14`, `col15`, `col16`, `col17`, `col18`, `col19`, `col20` FROM `bulk_tbl_0000000055c138a4090f5600_bl_tmp028` UNION ALL SELECT `col01`, `col02`, `col03`, `col04`, `col05`, `col06`, `col07`, `col08`, `col09`, `col10`, `col11`, `col12`, `col13`, `col14`, `col15`, `col16`, `col17`, `col18`, `col19`, `col20` FROM `bulk_tbl_0000000055c138a4090f5600_bl_tmp029`
2015-08-05 09:23:03.963 +0900 [ERROR] (transaction): Operation failed (0:08S01)
2015-08-05 09:23:43.490 +0900 [INFO] (main): Transaction partially failed. Cleaning up the intermediate data. Use -r option to make it resumable.
2015-08-05 09:25:03.234 +0900 [INFO] (cleanup): Connecting to jdbc:mysql://localhost:3306/bulk_db options {user=bulk_user, tcpKeepAlive=true, useCompression=true, rewriteBatchedStatements=true, connectTimeout=300000, socketTimeout=1800000}
2015-08-05 09:25:16.514 +0900 [INFO] (cleanup): SQL: DROP TABLE IF EXISTS `bulk_tbl_0000000055c138a4090f5600_bl_tmp000`
2015-08-05 09:55:37.335 +0900 [ERROR] (cleanup): Operation failed (0:08S01)
org.embulk.exec.PartialExecutionException: java.lang.RuntimeException: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure

実際一時テーブルはDROPされず残ったままになってる。

mysql> show tables;
+---------------------------------------------+
| Tables_in_bulk_db                           |
+---------------------------------------------+
| bulk_tbl                                    |
| bulk_tbl_0000000055c138a4090f5600_bl_tmp001 |
| bulk_tbl_0000000055c138a4090f5600_bl_tmp002 |
| bulk_tbl_0000000055c138a4090f5600_bl_tmp003 |
| bulk_tbl_0000000055c138a4090f5600_bl_tmp004 |
| bulk_tbl_0000000055c138a4090f5600_bl_tmp005 |
| bulk_tbl_0000000055c138a4090f5600_bl_tmp006 |
| bulk_tbl_0000000055c138a4090f5600_bl_tmp007 |
| bulk_tbl_0000000055c138a4090f5600_bl_tmp008 |
| bulk_tbl_0000000055c138a4090f5600_bl_tmp009 |
| bulk_tbl_0000000055c138a4090f5600_bl_tmp010 |
| bulk_tbl_0000000055c138a4090f5600_bl_tmp011 |
| bulk_tbl_0000000055c138a4090f5600_bl_tmp012 |
| bulk_tbl_0000000055c138a4090f5600_bl_tmp013 |
| bulk_tbl_0000000055c138a4090f5600_bl_tmp014 |
| bulk_tbl_0000000055c138a4090f5600_bl_tmp015 |
| bulk_tbl_0000000055c138a4090f5600_bl_tmp016 |
| bulk_tbl_0000000055c138a4090f5600_bl_tmp017 |
| bulk_tbl_0000000055c138a4090f5600_bl_tmp018 |
| bulk_tbl_0000000055c138a4090f5600_bl_tmp019 |
| bulk_tbl_0000000055c138a4090f5600_bl_tmp020 |
| bulk_tbl_0000000055c138a4090f5600_bl_tmp021 |
| bulk_tbl_0000000055c138a4090f5600_bl_tmp022 |
| bulk_tbl_0000000055c138a4090f5600_bl_tmp023 |
| bulk_tbl_0000000055c138a4090f5600_bl_tmp024 |
| bulk_tbl_0000000055c138a4090f5600_bl_tmp025 |
| bulk_tbl_0000000055c138a4090f5600_bl_tmp026 |
| bulk_tbl_0000000055c138a4090f5600_bl_tmp027 |
| bulk_tbl_0000000055c138a4090f5600_bl_tmp028 |
| bulk_tbl_0000000055c138a4090f5600_bl_tmp029 |
+---------------------------------------------+
30 rows in set (0.00 sec)

コネクションタイムアウトを疑ったが、wait_connectionはデフォルトのまま8時間(28800 msec)となっているので、それが原因ではなさそう。

mysql> show global variables like '%timeout';
+----------------------------+----------+
| Variable_name              | Value    |
+----------------------------+----------+
| connect_timeout            | 10       |
| delayed_insert_timeout     | 300      |
| innodb_lock_wait_timeout   | 50       |
| innodb_rollback_on_timeout | OFF      |
| interactive_timeout        | 28800    |
| lock_wait_timeout          | 31536000 |
| net_read_timeout           | 30       |
| net_write_timeout          | 60       |
| slave_net_timeout          | 3600     |
| wait_timeout               | 28800    |
+----------------------------+----------+
10 rows in set (0.00 sec)

もう一度ログを見てみると、jdbcのコネクションタイムアウトの設定が300000ミリ秒になっている。
この状態だと、5分でコネクションが切れるのかも。

2015-08-05 08:35:37.456 +0900 [INFO] (transaction): Connecting to jdbc:mysql://localhost:3306/bulk_db options {user=bulk_user, tcpKeepAlive=true, useCompression=true, rewriteBatchedStatements=true, connectTimeout=300000, socketTimeout=2700000}

Embulkのソースを確認したところ、確かに300000ミリ秒を明示的に指定している。

org.embulk.output.MySQLOutputPlugin.java
props.setProperty("connectTimeout", "300000"); // milliseconds

そういえばEmbulk定義ファイルで option として設定できたので、値を7200000ミリ秒(120分)に設定してみる。

out:
   options: {connectTimeout: 7200000}

再度実行

$ embulk run ./sample/config.yml

でもエラーは変わらず。。。

もう一度さっきのログを見てみると、jdbcのソケットタイムアウトの設定が2700000ミリ秒(45分)になっている。

2015-08-05 08:35:37.456 +0900 [INFO] (transaction): Connecting to jdbc:mysql://localhost:3306/bulk_db options {user=bulk_user, tcpKeepAlive=true, useCompression=true, rewriteBatchedStatements=true, connectTimeout=300000, socketTimeout=2700000}

そういえば2700000ミリ秒ぐらいでコネクションが切れてた気がする。

The last packet successfully received from the server was 2,839,153 milliseconds ago.  The last packet sent successfully to the server was 2,839,151 milliseconds ago.
    at org.embulk.exec.BulkLoader$LoaderState.buildPartialExecuteException(org/embulk/exec/BulkLoader.java:331)

Embulkのソースを確認したところ、確かに2700000ミリ秒を明示的に指定している。

org.embulk.output.MySQLOutputPlugin.java
props.setProperty("socketTimeout", "2700000");   // milliseconds;

こちらもEmbulk定義ファイルでoptionとして7200000ミリ秒(120分)に設定してみる。

out:
   options: {socketTimeout: 7200000}

再度実行

$ embulk run ./sample/config.yml

うまくいったっぽい!!

mysql> select count(col01) from bulk_tbl;
+--------------+
| count(col01) |
+--------------+
|     30000000 |
+--------------+
1 row in set (5 min 30.40 sec)

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
What you can do with signing up
22