注意
- 一時的な検証目的なのでセキュリティなど考慮しない設定になっている部分があるのでご注意を。
環境概要
EMR
- m3.xlarge * 3(master 1, slave 2)
AuroraDB
- db.r3.large
- テストテーブル
CREATE DATABASE sqoop_test;
GRANT ALL PRIVILEGES ON sqoop_test.* TO sqoopuser IDENTIFIED BY 'sqoopuser999' WITH GRANT OPTION;
FLUSH PRIVILEGES;
CREATE TABLE `sqoop_test_utf8` (
`key1` char(4) NOT NULL DEFAULT '',
`code1` varchar(10) DEFAULT NULL,
`item1` varchar(10) DEFAULT NULL,
PRIMARY KEY (`key1`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
insert into sqoop_test_utf8 values('A001', 'shibuya_001', 'book');
insert into sqoop_test_utf8 values('A002', 'shibuya_002', 'DVD');
insert into sqoop_test_utf8 values('B003', 'shinagawa_001', 'stationery');
insert into sqoop_test_utf8 values('C004', 'ebisu_001', 'dictionary');
insert into sqoop_test_utf8 values('D005', 'meguro_001', 'book');
通信設定
- Aurora <- EMR(3台とも):3306
Sqoopする準備(EMRのMasterノードで実施)
Sqoopのインストール
$ cd /tmp
$ curl -OL http://ftp.tsukuba.wide.ad.jp/software/apache/sqoop/1.4.6/sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz
$ tar zxvf sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz
mariadb jdbcの配置
$ cd /tmp
$ curl -OL https://downloads.mariadb.com/enterprise/rs6a-06j2/connectors/java/connector-java-1.3.6/mariadb-java-client-1.3.6.jar
$ cp mariadb-java-client-1.3.6.jar /tmp/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/lib/
mysql jdbcの配置
$ cd /tmp
$ curl -LO http://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java-5.1.38.tar.gz
$ tar zxvf mysql-connector-java-5.1.38.tar.gz
$ cd mysql-connector-java-5.1.38
$ cp mysql-connector-java-5.1.38-bin.jar /tmp/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/lib/
Sqoop実行
Using mysql driver
$ ./sqoop import --connect jdbc:mysql://<Auroraのクラスターエンドポイント>/sqoop_test --username sqoopuser --password sqoopuser999 --table sqoop_test_utf8
Using mariadb driver
$ ./sqoop import --connect jdbc:mysql://<Auroraのクラスターエンドポイント>/sqoop_test --username sqoopuser --password sqoopuser999 --table sqoop_test_utf8 --driver org.mariadb.jdbc.Driver
hive import
$ ./sqoop import --connect jdbc:mysql://test-cluster.cluster-cxkmqunzhbcv.us-east-1.rds.amazonaws.com/sqoop_test --username sqoopuser --password sqoopuser999 --table sqoop_test_utf8 --driver org.mariadb.jdbc.Driver --hive-import --hive-table sqoop_test
- データ確認
hive> select * from sqoop_test;
OK
A001 shibuya_00 book
A002 shibuya_00 DVD
B003 shinagawa_ stationery
C004 ebisu_001 dictionary
D005 meguro_001 book
Time taken: 0.786 seconds, Fetched: 5 row(s)
その他
テーブルの文字コードとsqoop側の文字コードの違いでエラー?
-
テーブルの文字コードをlatin1にしていると以下のエラーが出てジョブが失敗した。なので、上記で「utf-8」にしている。
-
やってはいないが、Sqoop側で対応する場合、「--connection-param-file」を使って「characterEncoding」で指定してやれば行けるのかな。
-
エラーログ
16/03/09 06:41:21 INFO mapreduce.Job: Task Id : attempt_1457497192768_0001_m_000000_2, Status : FAILED
Error: java.io.IOException: SQLException in nextKeyValue
at org.apache.sqoop.mapreduce.db.DBRecordReader.nextKeyValue(DBRecordReader.java:277)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:565)
at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:152)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:796)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:172)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:166)
Caused by: java.sql.SQLException: Illegal mix of collations (latin1_swedish_ci,IMPLICIT) and (utf8_general_ci,COERCIBLE) for operation '<'
at org.mariadb.jdbc.internal.SQLExceptionMapper.get(SQLExceptionMapper.java:149)
at org.mariadb.jdbc.internal.SQLExceptionMapper.throwException(SQLExceptionMapper.java:106)
at org.mariadb.jdbc.MySQLStatement.executeQueryEpilog(MySQLStatement.java:252)
at org.mariadb.jdbc.MySQLStatement.execute(MySQLStatement.java:278)
at org.mariadb.jdbc.MySQLStatement.executeQuery(MySQLStatement.java:333)
at org.mariadb.jdbc.MySQLPreparedStatement.executeQuery(MySQLPreparedStatement.java:99)
at org.apache.sqoop.mapreduce.db.DBRecordReader.executeQuery(DBRecordReader.java:111)
at org.apache.sqoop.mapreduce.db.DBRecordReader.nextKeyValue(DBRecordReader.java:235)
... 12 more
参考:実行結果
- mysql driver
Warning: /mnt/tmp/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/bin/../../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /mnt/tmp/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/bin/../../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
16/03/09 06:53:02 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
16/03/09 06:53:02 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
16/03/09 06:53:02 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
16/03/09 06:53:02 INFO tool.CodeGenTool: Beginning code generation
16/03/09 06:53:02 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `sqoop_test_utf8` AS t LIMIT 1
16/03/09 06:53:02 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `sqoop_test_utf8` AS t LIMIT 1
16/03/09 06:53:02 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce
注意:/tmp/sqoop-hadoop/compile/f5749f4d128f954b02ac58ff6e31a933/sqoop_test_utf8.javaは非推奨のAPIを使用またはオーバーライドしています。
注意:詳細は、-Xlint:deprecationオプションを指定して再コンパイルしてください。
16/03/09 06:53:04 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/f5749f4d128f954b02ac58ff6e31a933/sqoop_test_utf8.jar
16/03/09 06:53:04 WARN manager.MySQLManager: It looks like you are importing from mysql.
16/03/09 06:53:04 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
16/03/09 06:53:04 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
16/03/09 06:53:04 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
16/03/09 06:53:04 INFO mapreduce.ImportJobBase: Beginning import of sqoop_test_utf8
16/03/09 06:53:05 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
16/03/09 06:53:05 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
16/03/09 06:53:05 INFO client.RMProxy: Connecting to ResourceManager at ip-172-31-55-2.ec2.internal/172.31.55.2:8032
16/03/09 06:53:06 INFO metrics.MetricsSaver: MetricsConfigRecord disabledInCluster: false instanceEngineCycleSec: 60 clusterEngineCycleSec: 60 disableClusterEngine: false maxMemoryMb: 3072 maxInstanceCount: 500 lastModified: 1457497199168
16/03/09 06:53:06 INFO metrics.MetricsSaver: Created MetricsSaver j-3UO8LO9Y9U00I:i-81fac878:Sqoop:05010 period:60 /mnt/var/em/raw/i-81fac878_20160309_Sqoop_05010_raw.bin
16/03/09 06:53:07 INFO db.DBInputFormat: Using read commited transaction isolation
16/03/09 06:53:07 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(`key1`), MAX(`key1`) FROM `sqoop_test_utf8`
16/03/09 06:53:07 WARN db.TextSplitter: Generating splits for a textual index column.
16/03/09 06:53:07 WARN db.TextSplitter: If your database sorts in a case-insensitive order, this may result in a partial import or duplicate records.
16/03/09 06:53:07 WARN db.TextSplitter: You are strongly encouraged to choose an integral split column.
16/03/09 06:53:07 INFO mapreduce.JobSubmitter: number of splits:4
16/03/09 06:53:07 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1457497192768_0002
16/03/09 06:53:08 INFO impl.YarnClientImpl: Submitted application application_1457497192768_0002
16/03/09 06:53:08 INFO mapreduce.Job: The url to track the job: http://ip-172-31-55-2.ec2.internal:20888/proxy/application_1457497192768_0002/
16/03/09 06:53:08 INFO mapreduce.Job: Running job: job_1457497192768_0002
16/03/09 06:53:15 INFO mapreduce.Job: Job job_1457497192768_0002 running in uber mode : false
16/03/09 06:53:15 INFO mapreduce.Job: map 0% reduce 0%
16/03/09 06:53:23 INFO mapreduce.Job: map 50% reduce 0%
16/03/09 06:53:24 INFO mapreduce.Job: map 100% reduce 0%
16/03/09 06:53:24 INFO mapreduce.Job: Job job_1457497192768_0002 completed successfully
16/03/09 06:53:24 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=565716
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=461
HDFS: Number of bytes written=115
HDFS: Number of read operations=16
HDFS: Number of large read operations=0
HDFS: Number of write operations=8
Job Counters
Launched map tasks=4
Other local map tasks=4
Total time spent by all maps in occupied slots (ms)=1013580
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=22524
Total vcore-seconds taken by all map tasks=22524
Total megabyte-seconds taken by all map tasks=32434560
Map-Reduce Framework
Map input records=5
Map output records=5
Input split bytes=461
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=286
CPU time spent (ms)=4590
Physical memory (bytes) snapshot=869433344
Virtual memory (bytes) snapshot=8212455424
Total committed heap usage (bytes)=1195376640
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=115
16/03/09 06:53:24 INFO mapreduce.ImportJobBase: Transferred 115 bytes in 18.7289 seconds (6.1403 bytes/sec)
16/03/09 06:53:24 INFO mapreduce.ImportJobBase: Retrieved 5 records.
- mariadb driver
Warning: /mnt/tmp/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/bin/../../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /mnt/tmp/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/bin/../../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
16/03/09 06:58:22 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
16/03/09 06:58:22 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
16/03/09 06:58:22 WARN sqoop.ConnFactory: Parameter --driver is set to an explicit driver however appropriate connection manager is not being set (via --connection-manager). Sqoop is going to fall back to org.apache.sqoop.manager.GenericJdbcManager. Please specify explicitly which connection manager should be used next time.
16/03/09 06:58:22 INFO manager.SqlManager: Using default fetchSize of 1000
16/03/09 06:58:22 INFO tool.CodeGenTool: Beginning code generation
16/03/09 06:58:22 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM sqoop_test_utf8 AS t WHERE 1=0
16/03/09 06:58:22 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM sqoop_test_utf8 AS t WHERE 1=0
16/03/09 06:58:22 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce
注意:/tmp/sqoop-hadoop/compile/131e4ef66f3fe8206ed85367ddc6aa87/sqoop_test_utf8.javaは非推奨のAPIを使用またはオーバーライドしています。
注意:詳細は、-Xlint:deprecationオプションを指定して再コンパイルしてください。
16/03/09 06:58:24 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/131e4ef66f3fe8206ed85367ddc6aa87/sqoop_test_utf8.jar
16/03/09 06:58:24 INFO mapreduce.ImportJobBase: Beginning import of sqoop_test_utf8
16/03/09 06:58:25 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
16/03/09 06:58:25 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM sqoop_test_utf8 AS t WHERE 1=0
16/03/09 06:58:25 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
16/03/09 06:58:26 INFO client.RMProxy: Connecting to ResourceManager at ip-172-31-55-2.ec2.internal/172.31.55.2:8032
16/03/09 06:58:26 INFO metrics.MetricsSaver: MetricsConfigRecord disabledInCluster: false instanceEngineCycleSec: 60 clusterEngineCycleSec: 60 disableClusterEngine: false maxMemoryMb: 3072 maxInstanceCount: 500 lastModified: 1457497199168
16/03/09 06:58:26 INFO metrics.MetricsSaver: Created MetricsSaver j-3UO8LO9Y9U00I:i-81fac878:Sqoop:07626 period:60 /mnt/var/em/raw/i-81fac878_20160309_Sqoop_07626_raw.bin
16/03/09 06:58:27 INFO db.DBInputFormat: Using read commited transaction isolation
16/03/09 06:58:27 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(key1), MAX(key1) FROM sqoop_test_utf8
16/03/09 06:58:27 WARN db.TextSplitter: Generating splits for a textual index column.
16/03/09 06:58:27 WARN db.TextSplitter: If your database sorts in a case-insensitive order, this may result in a partial import or duplicate records.
16/03/09 06:58:27 WARN db.TextSplitter: You are strongly encouraged to choose an integral split column.
16/03/09 06:58:27 INFO mapreduce.JobSubmitter: number of splits:4
16/03/09 06:58:28 INFO metrics.MetricsSaver: 1 aggregated HDFSWriteDelay 306 raw values into 2 aggregated values, total 2
16/03/09 06:58:28 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1457497192768_0003
16/03/09 06:58:28 INFO impl.YarnClientImpl: Submitted application application_1457497192768_0003
16/03/09 06:58:28 INFO mapreduce.Job: The url to track the job: http://ip-172-31-55-2.ec2.internal:20888/proxy/application_1457497192768_0003/
16/03/09 06:58:28 INFO mapreduce.Job: Running job: job_1457497192768_0003
16/03/09 06:58:35 INFO mapreduce.Job: Job job_1457497192768_0003 running in uber mode : false
16/03/09 06:58:35 INFO mapreduce.Job: map 0% reduce 0%
16/03/09 06:58:43 INFO mapreduce.Job: map 25% reduce 0%
16/03/09 06:58:44 INFO mapreduce.Job: map 75% reduce 0%
16/03/09 06:58:45 INFO mapreduce.Job: map 100% reduce 0%
16/03/09 06:58:45 INFO mapreduce.Job: Job job_1457497192768_0003 completed successfully
16/03/09 06:58:45 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=565504
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=445
HDFS: Number of bytes written=115
HDFS: Number of read operations=16
HDFS: Number of large read operations=0
HDFS: Number of write operations=8
Job Counters
Launched map tasks=4
Other local map tasks=4
Total time spent by all maps in occupied slots (ms)=1054530
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=23434
Total vcore-seconds taken by all map tasks=23434
Total megabyte-seconds taken by all map tasks=33744960
Map-Reduce Framework
Map input records=5
Map output records=5
Input split bytes=445
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=292
CPU time spent (ms)=5630
Physical memory (bytes) snapshot=992595968
Virtual memory (bytes) snapshot=8235737088
Total committed heap usage (bytes)=1195376640
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=115
16/03/09 06:58:45 INFO mapreduce.ImportJobBase: Transferred 115 bytes in 19.782 seconds (5.8134 bytes/sec)
16/03/09 06:58:45 INFO mapreduce.ImportJobBase: Retrieved 5 records.