5
5

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 5 years have passed since last update.

AWS Aurora から EMRへSqoopでデータを抜いた時のメモ

Last updated at Posted at 2016-03-09

注意

  • 一時的な検証目的なのでセキュリティなど考慮しない設定になっている部分があるのでご注意を。

環境概要

EMR

  • m3.xlarge * 3(master 1, slave 2)

AuroraDB

  • db.r3.large
  • テストテーブル
CREATE DATABASE sqoop_test;

GRANT ALL PRIVILEGES ON sqoop_test.* TO sqoopuser IDENTIFIED BY 'sqoopuser999' WITH GRANT OPTION;
FLUSH PRIVILEGES;

CREATE TABLE `sqoop_test_utf8` (
  `key1` char(4) NOT NULL DEFAULT '',
  `code1` varchar(10) DEFAULT NULL,   
  `item1` varchar(10) DEFAULT NULL,   
  PRIMARY KEY (`key1`) ) ENGINE=InnoDB  DEFAULT CHARSET=utf8;
  
insert into sqoop_test_utf8 values('A001', 'shibuya_001', 'book');
 insert into sqoop_test_utf8 values('A002', 'shibuya_002', 'DVD');
 insert into sqoop_test_utf8 values('B003', 'shinagawa_001', 'stationery');
 insert into sqoop_test_utf8 values('C004', 'ebisu_001', 'dictionary');
 insert into sqoop_test_utf8 values('D005', 'meguro_001', 'book');

通信設定

  • Aurora <- EMR(3台とも):3306

Sqoopする準備(EMRのMasterノードで実施)

Sqoopのインストール

$ cd /tmp
$ curl -OL http://ftp.tsukuba.wide.ad.jp/software/apache/sqoop/1.4.6/sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz
$ tar zxvf sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz

mariadb jdbcの配置

$ cd /tmp
$ curl -OL https://downloads.mariadb.com/enterprise/rs6a-06j2/connectors/java/connector-java-1.3.6/mariadb-java-client-1.3.6.jar
$ cp mariadb-java-client-1.3.6.jar /tmp/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/lib/

mysql jdbcの配置

$ cd /tmp
$ curl -LO http://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java-5.1.38.tar.gz
$ tar zxvf mysql-connector-java-5.1.38.tar.gz
$ cd mysql-connector-java-5.1.38
$ cp mysql-connector-java-5.1.38-bin.jar /tmp/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/lib/

Sqoop実行

Using mysql driver

$ ./sqoop import --connect jdbc:mysql://<Auroraのクラスターエンドポイント>/sqoop_test --username sqoopuser --password sqoopuser999 --table sqoop_test_utf8

Using mariadb driver

$ ./sqoop import --connect jdbc:mysql://<Auroraのクラスターエンドポイント>/sqoop_test --username sqoopuser --password sqoopuser999 --table sqoop_test_utf8 --driver org.mariadb.jdbc.Driver

hive import

$ ./sqoop import --connect jdbc:mysql://test-cluster.cluster-cxkmqunzhbcv.us-east-1.rds.amazonaws.com/sqoop_test --username sqoopuser --password sqoopuser999 --table sqoop_test_utf8 --driver org.mariadb.jdbc.Driver --hive-import --hive-table sqoop_test
  • データ確認
hive> select * from sqoop_test;
OK
A001	shibuya_00	book
A002	shibuya_00	DVD
B003	shinagawa_	stationery
C004	ebisu_001	dictionary
D005	meguro_001	book
Time taken: 0.786 seconds, Fetched: 5 row(s)

その他

テーブルの文字コードとsqoop側の文字コードの違いでエラー?

  • テーブルの文字コードをlatin1にしていると以下のエラーが出てジョブが失敗した。なので、上記で「utf-8」にしている。

  • やってはいないが、Sqoop側で対応する場合、「--connection-param-file」を使って「characterEncoding」で指定してやれば行けるのかな。

  • エラーログ

16/03/09 06:41:21 INFO mapreduce.Job: Task Id : attempt_1457497192768_0001_m_000000_2, Status : FAILED
Error: java.io.IOException: SQLException in nextKeyValue
   at org.apache.sqoop.mapreduce.db.DBRecordReader.nextKeyValue(DBRecordReader.java:277)
   at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:565)
   at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
   at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:152)
   at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:796)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:172)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:166)
Caused by: java.sql.SQLException: Illegal mix of collations (latin1_swedish_ci,IMPLICIT) and (utf8_general_ci,COERCIBLE) for operation '<'
   at org.mariadb.jdbc.internal.SQLExceptionMapper.get(SQLExceptionMapper.java:149)
   at org.mariadb.jdbc.internal.SQLExceptionMapper.throwException(SQLExceptionMapper.java:106)
   at org.mariadb.jdbc.MySQLStatement.executeQueryEpilog(MySQLStatement.java:252)
   at org.mariadb.jdbc.MySQLStatement.execute(MySQLStatement.java:278)
   at org.mariadb.jdbc.MySQLStatement.executeQuery(MySQLStatement.java:333)
   at org.mariadb.jdbc.MySQLPreparedStatement.executeQuery(MySQLPreparedStatement.java:99)
   at org.apache.sqoop.mapreduce.db.DBRecordReader.executeQuery(DBRecordReader.java:111)
   at org.apache.sqoop.mapreduce.db.DBRecordReader.nextKeyValue(DBRecordReader.java:235)
   ... 12 more

参考:実行結果

  • mysql driver
Warning: /mnt/tmp/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/bin/../../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /mnt/tmp/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/bin/../../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
16/03/09 06:53:02 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
16/03/09 06:53:02 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
16/03/09 06:53:02 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
16/03/09 06:53:02 INFO tool.CodeGenTool: Beginning code generation
16/03/09 06:53:02 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `sqoop_test_utf8` AS t LIMIT 1
16/03/09 06:53:02 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `sqoop_test_utf8` AS t LIMIT 1
16/03/09 06:53:02 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce
注意:/tmp/sqoop-hadoop/compile/f5749f4d128f954b02ac58ff6e31a933/sqoop_test_utf8.javaは非推奨のAPIを使用またはオーバーライドしています。
注意:詳細は、-Xlint:deprecationオプションを指定して再コンパイルしてください。
16/03/09 06:53:04 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/f5749f4d128f954b02ac58ff6e31a933/sqoop_test_utf8.jar
16/03/09 06:53:04 WARN manager.MySQLManager: It looks like you are importing from mysql.
16/03/09 06:53:04 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
16/03/09 06:53:04 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
16/03/09 06:53:04 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
16/03/09 06:53:04 INFO mapreduce.ImportJobBase: Beginning import of sqoop_test_utf8
16/03/09 06:53:05 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
16/03/09 06:53:05 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
16/03/09 06:53:05 INFO client.RMProxy: Connecting to ResourceManager at ip-172-31-55-2.ec2.internal/172.31.55.2:8032
16/03/09 06:53:06 INFO metrics.MetricsSaver: MetricsConfigRecord disabledInCluster: false instanceEngineCycleSec: 60 clusterEngineCycleSec: 60 disableClusterEngine: false maxMemoryMb: 3072 maxInstanceCount: 500 lastModified: 1457497199168
16/03/09 06:53:06 INFO metrics.MetricsSaver: Created MetricsSaver j-3UO8LO9Y9U00I:i-81fac878:Sqoop:05010 period:60 /mnt/var/em/raw/i-81fac878_20160309_Sqoop_05010_raw.bin
16/03/09 06:53:07 INFO db.DBInputFormat: Using read commited transaction isolation
16/03/09 06:53:07 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(`key1`), MAX(`key1`) FROM `sqoop_test_utf8`
16/03/09 06:53:07 WARN db.TextSplitter: Generating splits for a textual index column.
16/03/09 06:53:07 WARN db.TextSplitter: If your database sorts in a case-insensitive order, this may result in a partial import or duplicate records.
16/03/09 06:53:07 WARN db.TextSplitter: You are strongly encouraged to choose an integral split column.
16/03/09 06:53:07 INFO mapreduce.JobSubmitter: number of splits:4
16/03/09 06:53:07 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1457497192768_0002
16/03/09 06:53:08 INFO impl.YarnClientImpl: Submitted application application_1457497192768_0002
16/03/09 06:53:08 INFO mapreduce.Job: The url to track the job: http://ip-172-31-55-2.ec2.internal:20888/proxy/application_1457497192768_0002/
16/03/09 06:53:08 INFO mapreduce.Job: Running job: job_1457497192768_0002
16/03/09 06:53:15 INFO mapreduce.Job: Job job_1457497192768_0002 running in uber mode : false
16/03/09 06:53:15 INFO mapreduce.Job:  map 0% reduce 0%
16/03/09 06:53:23 INFO mapreduce.Job:  map 50% reduce 0%
16/03/09 06:53:24 INFO mapreduce.Job:  map 100% reduce 0%
16/03/09 06:53:24 INFO mapreduce.Job: Job job_1457497192768_0002 completed successfully
16/03/09 06:53:24 INFO mapreduce.Job: Counters: 30
   File System Counters
   	FILE: Number of bytes read=0
   	FILE: Number of bytes written=565716
   	FILE: Number of read operations=0
   	FILE: Number of large read operations=0
   	FILE: Number of write operations=0
   	HDFS: Number of bytes read=461
   	HDFS: Number of bytes written=115
   	HDFS: Number of read operations=16
   	HDFS: Number of large read operations=0
   	HDFS: Number of write operations=8
   Job Counters
   	Launched map tasks=4
   	Other local map tasks=4
   	Total time spent by all maps in occupied slots (ms)=1013580
   	Total time spent by all reduces in occupied slots (ms)=0
   	Total time spent by all map tasks (ms)=22524
   	Total vcore-seconds taken by all map tasks=22524
   	Total megabyte-seconds taken by all map tasks=32434560
   Map-Reduce Framework
   	Map input records=5
   	Map output records=5
   	Input split bytes=461
   	Spilled Records=0
   	Failed Shuffles=0
   	Merged Map outputs=0
   	GC time elapsed (ms)=286
   	CPU time spent (ms)=4590
   	Physical memory (bytes) snapshot=869433344
   	Virtual memory (bytes) snapshot=8212455424
   	Total committed heap usage (bytes)=1195376640
   File Input Format Counters
   	Bytes Read=0
   File Output Format Counters
   	Bytes Written=115
16/03/09 06:53:24 INFO mapreduce.ImportJobBase: Transferred 115 bytes in 18.7289 seconds (6.1403 bytes/sec)
16/03/09 06:53:24 INFO mapreduce.ImportJobBase: Retrieved 5 records.
  • mariadb driver
Warning: /mnt/tmp/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/bin/../../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /mnt/tmp/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/bin/../../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
16/03/09 06:58:22 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
16/03/09 06:58:22 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
16/03/09 06:58:22 WARN sqoop.ConnFactory: Parameter --driver is set to an explicit driver however appropriate connection manager is not being set (via --connection-manager). Sqoop is going to fall back to org.apache.sqoop.manager.GenericJdbcManager. Please specify explicitly which connection manager should be used next time.
16/03/09 06:58:22 INFO manager.SqlManager: Using default fetchSize of 1000
16/03/09 06:58:22 INFO tool.CodeGenTool: Beginning code generation
16/03/09 06:58:22 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM sqoop_test_utf8 AS t WHERE 1=0
16/03/09 06:58:22 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM sqoop_test_utf8 AS t WHERE 1=0
16/03/09 06:58:22 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce
注意:/tmp/sqoop-hadoop/compile/131e4ef66f3fe8206ed85367ddc6aa87/sqoop_test_utf8.javaは非推奨のAPIを使用またはオーバーライドしています。
注意:詳細は、-Xlint:deprecationオプションを指定して再コンパイルしてください。
16/03/09 06:58:24 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/131e4ef66f3fe8206ed85367ddc6aa87/sqoop_test_utf8.jar
16/03/09 06:58:24 INFO mapreduce.ImportJobBase: Beginning import of sqoop_test_utf8
16/03/09 06:58:25 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
16/03/09 06:58:25 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM sqoop_test_utf8 AS t WHERE 1=0
16/03/09 06:58:25 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
16/03/09 06:58:26 INFO client.RMProxy: Connecting to ResourceManager at ip-172-31-55-2.ec2.internal/172.31.55.2:8032
16/03/09 06:58:26 INFO metrics.MetricsSaver: MetricsConfigRecord disabledInCluster: false instanceEngineCycleSec: 60 clusterEngineCycleSec: 60 disableClusterEngine: false maxMemoryMb: 3072 maxInstanceCount: 500 lastModified: 1457497199168
16/03/09 06:58:26 INFO metrics.MetricsSaver: Created MetricsSaver j-3UO8LO9Y9U00I:i-81fac878:Sqoop:07626 period:60 /mnt/var/em/raw/i-81fac878_20160309_Sqoop_07626_raw.bin
16/03/09 06:58:27 INFO db.DBInputFormat: Using read commited transaction isolation
16/03/09 06:58:27 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(key1), MAX(key1) FROM sqoop_test_utf8
16/03/09 06:58:27 WARN db.TextSplitter: Generating splits for a textual index column.
16/03/09 06:58:27 WARN db.TextSplitter: If your database sorts in a case-insensitive order, this may result in a partial import or duplicate records.
16/03/09 06:58:27 WARN db.TextSplitter: You are strongly encouraged to choose an integral split column.
16/03/09 06:58:27 INFO mapreduce.JobSubmitter: number of splits:4
16/03/09 06:58:28 INFO metrics.MetricsSaver: 1 aggregated HDFSWriteDelay 306 raw values into 2 aggregated values, total 2
16/03/09 06:58:28 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1457497192768_0003
16/03/09 06:58:28 INFO impl.YarnClientImpl: Submitted application application_1457497192768_0003
16/03/09 06:58:28 INFO mapreduce.Job: The url to track the job: http://ip-172-31-55-2.ec2.internal:20888/proxy/application_1457497192768_0003/
16/03/09 06:58:28 INFO mapreduce.Job: Running job: job_1457497192768_0003
16/03/09 06:58:35 INFO mapreduce.Job: Job job_1457497192768_0003 running in uber mode : false
16/03/09 06:58:35 INFO mapreduce.Job:  map 0% reduce 0%
16/03/09 06:58:43 INFO mapreduce.Job:  map 25% reduce 0%
16/03/09 06:58:44 INFO mapreduce.Job:  map 75% reduce 0%
16/03/09 06:58:45 INFO mapreduce.Job:  map 100% reduce 0%
16/03/09 06:58:45 INFO mapreduce.Job: Job job_1457497192768_0003 completed successfully
16/03/09 06:58:45 INFO mapreduce.Job: Counters: 30
   File System Counters
   	FILE: Number of bytes read=0
   	FILE: Number of bytes written=565504
   	FILE: Number of read operations=0
   	FILE: Number of large read operations=0
   	FILE: Number of write operations=0
   	HDFS: Number of bytes read=445
   	HDFS: Number of bytes written=115
   	HDFS: Number of read operations=16
   	HDFS: Number of large read operations=0
   	HDFS: Number of write operations=8
   Job Counters
   	Launched map tasks=4
   	Other local map tasks=4
   	Total time spent by all maps in occupied slots (ms)=1054530
   	Total time spent by all reduces in occupied slots (ms)=0
   	Total time spent by all map tasks (ms)=23434
   	Total vcore-seconds taken by all map tasks=23434
   	Total megabyte-seconds taken by all map tasks=33744960
   Map-Reduce Framework
   	Map input records=5
   	Map output records=5
   	Input split bytes=445
   	Spilled Records=0
   	Failed Shuffles=0
   	Merged Map outputs=0
   	GC time elapsed (ms)=292
   	CPU time spent (ms)=5630
   	Physical memory (bytes) snapshot=992595968
   	Virtual memory (bytes) snapshot=8235737088
   	Total committed heap usage (bytes)=1195376640
   File Input Format Counters
   	Bytes Read=0
   File Output Format Counters
   	Bytes Written=115
16/03/09 06:58:45 INFO mapreduce.ImportJobBase: Transferred 115 bytes in 19.782 seconds (5.8134 bytes/sec)
16/03/09 06:58:45 INFO mapreduce.ImportJobBase: Retrieved 5 records.
5
5
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
5
5

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?