LoginSignup
0
0

More than 5 years have passed since last update.

gceにインストールしたsparkでspark-sqlを使ってみる。そうubuntuで。

Posted at

spark-sqlを実行

  • gceでと書いておいて恐縮ですが、gceである必要が全然ありません。
  • でもクライアントも接続先もubuntuです。
  • sparkのインストールは前の投稿で行っているので省略します。

spark-sql起動

1.gcloudで接続します。

$ cd $SPARK_HOME
$ sudo ./bin/spark-spl
Spark assembly has been built with Hive, including Datanucleus jars on classpath
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0
log4j:WARN No appenders could be found for logger (org.apache.hadoop.util.Shell).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Unable to initialize logging using hive-log4j.properties, not found on CLASSPATH!
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
15/07/07 15:32:08 INFO SecurityManager: Changing view acls to: root,
15/07/07 15:32:08 INFO SecurityManager: Changing modify acls to: root,
15/07/07 15:32:08 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root, ); users with modify permissions: Set(root, )
15/07/07 15:32:08 INFO Slf4jLogger: Slf4jLogger started
15/07/07 15:32:08 INFO Remoting: Starting remoting
15/07/07 15:32:08 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@instance-1.c.custom-unison-00000.internal:58022]
15/07/07 15:32:08 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sparkDriver@instance-1.c.custom-unison-00000.internal:58022]
15/07/07 15:32:08 INFO Utils: Successfully started service 'sparkDriver' on port 58022.
15/07/07 15:32:08 INFO SparkEnv: Registering MapOutputTracker
15/07/07 15:32:08 INFO SparkEnv: Registering BlockManagerMaster
15/07/07 15:32:08 INFO DiskBlockManager: Created local directory at /tmp/spark-local-20150707153208-86b9
15/07/07 15:32:08 INFO Utils: Successfully started service 'Connection manager for block manager' on port 39311.
15/07/07 15:32:08 INFO ConnectionManager: Bound socket to port 39311 with id = ConnectionManagerId(instance-1.c.custom-unison-00000.internal,00000)
15/07/07 15:32:08 INFO MemoryStore: MemoryStore started with capacity 265.1 MB
15/07/07 15:32:08 INFO BlockManagerMaster: Trying to register BlockManager
15/07/07 15:32:08 INFO BlockManagerMasterActor: Registering block manager instance-1.c.custom-unison-00000.internal:39311 with 265.1 MB RAM
15/07/07 15:32:08 INFO BlockManagerMaster: Registered BlockManager
15/07/07 15:32:08 INFO HttpFileServer: HTTP File server directory is /tmp/spark-a8e3eb25-7a87-4138-8ec5-f387b76c21b1
15/07/07 15:32:08 INFO HttpServer: Starting HTTP Server
15/07/07 15:32:09 INFO Utils: Successfully started service 'HTTP file server' on port 45607.
15/07/07 15:32:09 INFO Utils: Successfully started service 'SparkUI' on port 4040.
15/07/07 15:32:09 INFO SparkUI: Started SparkUI at http://instance-1.c.custom-unison-00000.internal:4040
15/07/07 15:32:09 INFO AkkaUtils: Connecting to HeartbeatReceiver: akka.tcp://sparkDriver@instance-1.c.custom-unison-00000.internal:58022/user/HeartbeatReceiver
spark-sql> 

sparl-splが起動します。

2.テーブルを作る。
Spark SQL programming guideのexampleをそのまま使って、テーブルを作成する。

spark-sql> CREATE TABLE IF NOT EXISTS src (key INT, value STRING);
15/07/07 15:38:48 INFO ParseDriver: Parsing command: CREATE TABLE IF NOT EXISTS src (key INT, value STRING)
15/07/07 15:38:48 INFO ParseDriver: Parse Completed
15/07/07 15:38:48 INFO deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive
15/07/07 15:38:48 INFO Driver: <PERFLOG method=Driver.run>
15/07/07 15:38:48 INFO Driver: <PERFLOG method=TimeToSubmit>
15/07/07 15:38:48 INFO Driver: <PERFLOG method=compile>
15/07/07 15:38:48 INFO Driver: <PERFLOG method=parse>
15/07/07 15:38:48 INFO ParseDriver: Parsing command: CREATE TABLE IF NOT EXISTS src (key INT, value STRING)
15/07/07 15:38:48 INFO ParseDriver: Parse Completed
15/07/07 15:38:48 INFO Driver: </PERFLOG method=parse start=1436283528297 end=1436283528297 duration=0>
15/07/07 15:38:48 INFO Driver: <PERFLOG method=semanticAnalyze>
15/07/07 15:38:48 INFO SemanticAnalyzer: Starting Semantic Analysis
15/07/07 15:38:48 INFO SemanticAnalyzer: Creating table src position=27
15/07/07 15:38:48 INFO HiveMetaStore: 0: get_table : db=default tbl=src
15/07/07 15:38:48 INFO audit: ugi=root  ip=unknown-ip-addr  cmd=get_table : db=default tbl=src  
15/07/07 15:38:48 INFO Driver: Semantic Analysis Completed
15/07/07 15:38:48 INFO Driver: </PERFLOG method=semanticAnalyze start=1436283528298 end=1436283528347 duration=49>
15/07/07 15:38:48 INFO Driver: Returning Hive schema: Schema(fieldSchemas:null, properties:null)
15/07/07 15:38:48 INFO Driver: </PERFLOG method=compile start=1436283528296 end=1436283528348 duration=52>
15/07/07 15:38:48 INFO Driver: <PERFLOG method=Driver.execute>
15/07/07 15:38:48 INFO Driver: Starting command: CREATE TABLE IF NOT EXISTS src (key INT, value STRING)
15/07/07 15:38:48 INFO Driver: </PERFLOG method=TimeToSubmit start=1436283528296 end=1436283528349 duration=53>
15/07/07 15:38:48 INFO Driver: <PERFLOG method=runTasks>
15/07/07 15:38:48 INFO Driver: <PERFLOG method=task.DDL.Stage-0>
15/07/07 15:38:48 INFO DDLTask: Default to LazySimpleSerDe for table src
15/07/07 15:38:48 INFO HiveMetaStore: 0: create_table: Table(tableName:src, dbName:default, owner:root, createTime:1436283528, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:key, type:int, comment:null), FieldSchema(name:value, type:string, comment:null)], location:null, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=1}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[], parameters:{}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE, privileges:PrincipalPrivilegeSet(userPrivileges:null, groupPrivileges:null, rolePrivileges:null))
15/07/07 15:38:48 INFO audit: ugi=root  ip=unknown-ip-addr  cmd=create_table: Table(tableName:src, dbName:default, owner:root, createTime:1436283528, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:key, type:int, comment:null), FieldSchema(name:value, type:string, comment:null)], location:null, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=1}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[], parameters:{}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE, privileges:PrincipalPrivilegeSet(userPrivileges:null, groupPrivileges:null, rolePrivileges:null))   
15/07/07 15:38:48 INFO Driver: </PERFLOG method=task.DDL.Stage-0 start=1436283528349 end=1436283528460 duration=111>
15/07/07 15:38:48 INFO Driver: </PERFLOG method=runTasks start=1436283528349 end=1436283528460 duration=111>
15/07/07 15:38:48 INFO Driver: </PERFLOG method=Driver.execute start=1436283528348 end=1436283528460 duration=112>
OK
15/07/07 15:38:48 INFO Driver: OK
15/07/07 15:38:48 INFO Driver: <PERFLOG method=releaseLocks>
15/07/07 15:38:48 INFO Driver: </PERFLOG method=releaseLocks start=1436283528460 end=1436283528461 duration=1>
15/07/07 15:38:48 INFO Driver: </PERFLOG method=Driver.run start=1436283528296 end=1436283528461 duration=165>
15/07/07 15:38:48 INFO Driver: <PERFLOG method=releaseLocks>
15/07/07 15:38:48 INFO Driver: </PERFLOG method=releaseLocks start=1436283528461 end=1436283528461 duration=0>
Time taken: 0.207 seconds
15/07/07 15:38:48 INFO CliDriver: Time taken: 0.207 seconds
15/07/07 15:38:48 INFO Driver: <PERFLOG method=releaseLocks>
15/07/07 15:38:48 INFO Driver: </PERFLOG method=releaseLocks start=1436283528465 end=1436283528465 duration=0>

テーブルができたか確認してみる。

spark-sql> show tables;
15/07/07 15:40:32 INFO ParseDriver: Parsing command: show tables
15/07/07 15:40:32 INFO ParseDriver: Parse Completed
15/07/07 15:40:32 INFO deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive
15/07/07 15:40:32 INFO Driver: <PERFLOG method=Driver.run>
15/07/07 15:40:32 INFO Driver: <PERFLOG method=TimeToSubmit>
15/07/07 15:40:32 INFO Driver: <PERFLOG method=compile>
15/07/07 15:40:32 INFO Driver: <PERFLOG method=parse>
15/07/07 15:40:32 INFO ParseDriver: Parsing command: show tables
15/07/07 15:40:32 INFO ParseDriver: Parse Completed
15/07/07 15:40:32 INFO Driver: </PERFLOG method=parse start=1436283632094 end=1436283632095 duration=1>
15/07/07 15:40:32 INFO Driver: <PERFLOG method=semanticAnalyze>
15/07/07 15:40:32 INFO Driver: Semantic Analysis Completed
15/07/07 15:40:32 INFO Driver: </PERFLOG method=semanticAnalyze start=1436283632095 end=1436283632103 duration=8>
15/07/07 15:40:32 INFO ListSinkOperator: Initializing Self 0 OP
15/07/07 15:40:32 INFO ListSinkOperator: Operator 0 OP initialized
15/07/07 15:40:32 INFO ListSinkOperator: Initialization Done 0 OP
15/07/07 15:40:32 INFO Driver: Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:tab_name, type:string, comment:from deserializer)], properties:null)
15/07/07 15:40:32 INFO Driver: </PERFLOG method=compile start=1436283632094 end=1436283632104 duration=10>
15/07/07 15:40:32 INFO Driver: <PERFLOG method=Driver.execute>
15/07/07 15:40:32 INFO Driver: Starting command: show tables
15/07/07 15:40:32 INFO Driver: </PERFLOG method=TimeToSubmit start=1436283632093 end=1436283632105 duration=12>
15/07/07 15:40:32 INFO Driver: <PERFLOG method=runTasks>
15/07/07 15:40:32 INFO Driver: <PERFLOG method=task.DDL.Stage-0>
15/07/07 15:40:32 INFO HiveMetaStore: 0: get_database: default
15/07/07 15:40:32 INFO audit: ugi=root  ip=unknown-ip-addr  cmd=get_database: default   
15/07/07 15:40:32 INFO HiveMetaStore: 0: get_tables: db=default pat=.*
15/07/07 15:40:32 INFO audit: ugi=root  ip=unknown-ip-addr  cmd=get_tables: db=default pat=.*   
15/07/07 15:40:32 INFO Driver: </PERFLOG method=task.DDL.Stage-0 start=1436283632105 end=1436283632119 duration=14>
15/07/07 15:40:32 INFO Driver: </PERFLOG method=runTasks start=1436283632105 end=1436283632120 duration=15>
15/07/07 15:40:32 INFO Driver: </PERFLOG method=Driver.execute start=1436283632104 end=1436283632120 duration=16>
OK
15/07/07 15:40:32 INFO Driver: OK
15/07/07 15:40:32 INFO Driver: <PERFLOG method=releaseLocks>
15/07/07 15:40:32 INFO Driver: </PERFLOG method=releaseLocks start=1436283632120 end=1436283632120 duration=0>
15/07/07 15:40:32 INFO Driver: </PERFLOG method=Driver.run start=1436283632093 end=1436283632120 duration=27>
15/07/07 15:40:32 INFO deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
15/07/07 15:40:32 INFO FileInputFormat: Total input paths to process : 1
15/07/07 15:40:32 INFO Driver: <PERFLOG method=releaseLocks>
15/07/07 15:40:32 INFO Driver: </PERFLOG method=releaseLocks start=1436283632161 end=1436283632161 duration=0>
src
Time taken: 0.106 seconds
15/07/07 15:40:32 INFO CliDriver: Time taken: 0.106 seconds
15/07/07 15:40:32 INFO Driver: <PERFLOG method=releaseLocks>
15/07/07 15:40:32 INFO Driver: </PERFLOG method=releaseLocks start=1436283632169 end=1436283632170 duration=1>

確かに[src]ができてます。
では、データをロードしてみます。

spark-sql> LOAD DATA LOCAL INPATH 'examples/src/main/resources/kv1.txt' INTO TABLE src;
15/07/07 15:42:17 INFO ParseDriver: Parsing command: LOAD DATA LOCAL INPATH 'examples/src/main/resources/kv1.txt' INTO TABLE src
15/07/07 15:42:17 INFO ParseDriver: Parse Completed
15/07/07 15:42:17 INFO deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive
15/07/07 15:42:17 INFO Driver: <PERFLOG method=Driver.run>
15/07/07 15:42:17 INFO Driver: <PERFLOG method=TimeToSubmit>
15/07/07 15:42:17 INFO Driver: <PERFLOG method=compile>
15/07/07 15:42:17 INFO Driver: <PERFLOG method=parse>
15/07/07 15:42:17 INFO ParseDriver: Parsing command: LOAD DATA LOCAL INPATH 'examples/src/main/resources/kv1.txt' INTO TABLE src
15/07/07 15:42:17 INFO ParseDriver: Parse Completed
15/07/07 15:42:17 INFO Driver: </PERFLOG method=parse start=1436283737848 end=1436283737849 duration=1>
15/07/07 15:42:17 INFO Driver: <PERFLOG method=semanticAnalyze>
15/07/07 15:42:17 INFO HiveMetaStore: 0: get_table : db=default tbl=src
15/07/07 15:42:17 INFO audit: ugi=root  ip=unknown-ip-addr  cmd=get_table : db=default tbl=src  
15/07/07 15:42:17 INFO Driver: Semantic Analysis Completed
15/07/07 15:42:17 INFO Driver: </PERFLOG method=semanticAnalyze start=1436283737849 end=1436283737939 duration=90>
15/07/07 15:42:17 INFO Driver: Returning Hive schema: Schema(fieldSchemas:null, properties:null)
15/07/07 15:42:17 INFO Driver: </PERFLOG method=compile start=1436283737848 end=1436283737944 duration=96>
15/07/07 15:42:17 INFO Driver: <PERFLOG method=Driver.execute>
15/07/07 15:42:17 INFO Driver: Starting command: LOAD DATA LOCAL INPATH 'examples/src/main/resources/kv1.txt' INTO TABLE src
15/07/07 15:42:17 INFO Driver: </PERFLOG method=TimeToSubmit start=1436283737848 end=1436283737944 duration=96>
15/07/07 15:42:17 INFO Driver: <PERFLOG method=runTasks>
15/07/07 15:42:17 INFO Driver: <PERFLOG method=task.COPY.Stage-0>
Copying data from file:/usr/local/spark-1.1.0-bin-hadoop2.4/examples/src/main/resources/kv1.txt
15/07/07 15:42:17 INFO Task: Copying data from file:/usr/local/spark-1.1.0-bin-hadoop2.4/examples/src/main/resources/kv1.txt to file:/tmp/hive-root/hive_2015-07-07_15-42-17_848_6713777572196549102-1/-ext-10000
Copying file: file:/usr/local/spark-1.1.0-bin-hadoop2.4/examples/src/main/resources/kv1.txt
15/07/07 15:42:17 INFO Task: Copying file: file:/usr/local/spark-1.1.0-bin-hadoop2.4/examples/src/main/resources/kv1.txt
15/07/07 15:42:17 INFO Driver: </PERFLOG method=task.COPY.Stage-0 start=1436283737944 end=1436283737963 duration=19>
15/07/07 15:42:17 INFO Driver: <PERFLOG method=task.MOVE.Stage-1>
Loading data to table default.src
15/07/07 15:42:17 INFO Task: Loading data to table default.src from file:/tmp/hive-root/hive_2015-07-07_15-42-17_848_6713777572196549102-1/-ext-10000
15/07/07 15:42:17 INFO HiveMetaStore: 0: get_table : db=default tbl=src
15/07/07 15:42:17 INFO audit: ugi=root  ip=unknown-ip-addr  cmd=get_table : db=default tbl=src  
15/07/07 15:42:18 INFO HiveMetaStore: 0: get_table : db=default tbl=src
15/07/07 15:42:18 INFO audit: ugi=root  ip=unknown-ip-addr  cmd=get_table : db=default tbl=src  
15/07/07 15:42:18 INFO HiveMetaStore: 0: alter_table: db=default tbl=src newtbl=src
15/07/07 15:42:18 INFO audit: ugi=root  ip=unknown-ip-addr  cmd=alter_table: db=default tbl=src newtbl=src  
15/07/07 15:42:18 INFO HiveMetaStore: 0: get_table : db=default tbl=src
15/07/07 15:42:18 INFO audit: ugi=root  ip=unknown-ip-addr  cmd=get_table : db=default tbl=src  
15/07/07 15:42:18 INFO Driver: </PERFLOG method=task.MOVE.Stage-1 start=1436283737963 end=1436283738097 duration=134>
15/07/07 15:42:18 INFO Driver: <PERFLOG method=task.STATS.Stage-2>
15/07/07 15:42:18 INFO StatsTask: Executing stats task
15/07/07 15:42:18 INFO HiveMetaStore: 0: get_table : db=default tbl=src
15/07/07 15:42:18 INFO audit: ugi=root  ip=unknown-ip-addr  cmd=get_table : db=default tbl=src  
15/07/07 15:42:18 INFO HiveMetaStore: 0: get_table : db=default tbl=src
15/07/07 15:42:18 INFO audit: ugi=root  ip=unknown-ip-addr  cmd=get_table : db=default tbl=src  
15/07/07 15:42:18 INFO HiveMetaStore: 0: alter_table: db=default tbl=src newtbl=src
15/07/07 15:42:18 INFO audit: ugi=root  ip=unknown-ip-addr  cmd=alter_table: db=default tbl=src newtbl=src  
15/07/07 15:42:18 INFO HiveMetaStore: 0: get_table : db=default tbl=src
15/07/07 15:42:18 INFO audit: ugi=root  ip=unknown-ip-addr  cmd=get_table : db=default tbl=src  
Table default.src stats: [num_partitions: 0, num_files: 1, num_rows: 0, total_size: 5812, raw_data_size: 0]
15/07/07 15:42:18 INFO Task: Table default.src stats: [num_partitions: 0, num_files: 1, num_rows: 0, total_size: 5812, raw_data_size: 0]
15/07/07 15:42:18 INFO Driver: </PERFLOG method=task.STATS.Stage-2 start=1436283738097 end=1436283738204 duration=107>
15/07/07 15:42:18 INFO Driver: </PERFLOG method=runTasks start=1436283737944 end=1436283738204 duration=260>
15/07/07 15:42:18 INFO Driver: </PERFLOG method=Driver.execute start=1436283737944 end=1436283738204 duration=260>
OK
15/07/07 15:42:18 INFO Driver: OK
15/07/07 15:42:18 INFO Driver: <PERFLOG method=releaseLocks>
15/07/07 15:42:18 INFO Driver: </PERFLOG method=releaseLocks start=1436283738204 end=1436283738204 duration=0>
15/07/07 15:42:18 INFO Driver: </PERFLOG method=Driver.run start=1436283737848 end=1436283738205 duration=357>
15/07/07 15:42:18 INFO Driver: <PERFLOG method=releaseLocks>
15/07/07 15:42:18 INFO Driver: </PERFLOG method=releaseLocks start=1436283738205 end=1436283738205 duration=0>
Time taken: 0.394 seconds
15/07/07 15:42:18 INFO CliDriver: Time taken: 0.394 seconds
15/07/07 15:42:18 INFO Driver: <PERFLOG method=releaseLocks>
15/07/07 15:42:18 INFO Driver: </PERFLOG method=releaseLocks start=1436283738208 end=1436283738208 duration=0>

OKと出てます。

では、テーブルの中にデータが入ったか確認。

spark-sql> select count(*) FROM src;
15/07/07 15:43:50 INFO ParseDriver: Parsing command: select count(*) FROM src
15/07/07 15:43:50 INFO ParseDriver: Parse Completed
15/07/07 15:43:50 INFO HiveMetaStore: 0: get_table : db=default tbl=src
15/07/07 15:43:50 INFO audit: ugi=root  ip=unknown-ip-addr  cmd=get_table : db=default tbl=src  
15/07/07 15:43:50 INFO deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
15/07/07 15:43:50 INFO MemoryStore: ensureFreeSpace(454358) called with curMem=0, maxMem=278019440
15/07/07 15:43:50 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 443.7 KB, free 264.7 MB)
15/07/07 15:43:50 INFO SparkContext: Starting job: collect at HiveContext.scala:415
15/07/07 15:43:50 INFO FileInputFormat: Total input paths to process : 1
15/07/07 15:43:50 INFO DAGScheduler: Registering RDD 18 (mapPartitions at Exchange.scala:86)
15/07/07 15:43:50 INFO DAGScheduler: Got job 0 (collect at HiveContext.scala:415) with 1 output partitions (allowLocal=false)
15/07/07 15:43:50 INFO DAGScheduler: Final stage: Stage 0(collect at HiveContext.scala:415)
15/07/07 15:43:50 INFO DAGScheduler: Parents of final stage: List(Stage 1)
15/07/07 15:43:50 INFO DAGScheduler: Missing parents: List(Stage 1)
15/07/07 15:43:50 INFO DAGScheduler: Submitting Stage 1 (MapPartitionsRDD[18] at mapPartitions at Exchange.scala:86), which has no missing parents
15/07/07 15:43:50 INFO MemoryStore: ensureFreeSpace(11024) called with curMem=454358, maxMem=278019440
15/07/07 15:43:50 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 10.8 KB, free 264.7 MB)
15/07/07 15:43:50 INFO DAGScheduler: Submitting 2 missing tasks from Stage 1 (MapPartitionsRDD[18] at mapPartitions at Exchange.scala:86)
15/07/07 15:43:50 INFO TaskSchedulerImpl: Adding task set 1.0 with 2 tasks
15/07/07 15:43:51 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 0, localhost, PROCESS_LOCAL, 1182 bytes)
15/07/07 15:43:51 INFO TaskSetManager: Starting task 1.0 in stage 1.0 (TID 1, localhost, PROCESS_LOCAL, 1182 bytes)
15/07/07 15:43:51 INFO Executor: Running task 0.0 in stage 1.0 (TID 0)
15/07/07 15:43:51 INFO Executor: Running task 1.0 in stage 1.0 (TID 1)
15/07/07 15:43:51 INFO HadoopRDD: Input split: file:/user/hive/warehouse/src/kv1.txt:2906+2906
15/07/07 15:43:51 INFO HadoopRDD: Input split: file:/user/hive/warehouse/src/kv1.txt:0+2906
15/07/07 15:43:51 INFO deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id
15/07/07 15:43:51 INFO deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
15/07/07 15:43:51 INFO deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap
15/07/07 15:43:51 INFO deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
15/07/07 15:43:51 INFO deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id
15/07/07 15:43:51 INFO Executor: Finished task 1.0 in stage 1.0 (TID 1). 1895 bytes result sent to driver
15/07/07 15:43:51 INFO Executor: Finished task 0.0 in stage 1.0 (TID 0). 1895 bytes result sent to driver
15/07/07 15:43:51 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 0) in 401 ms on localhost (1/2)
15/07/07 15:43:51 INFO DAGScheduler: Stage 1 (mapPartitions at Exchange.scala:86) finished in 0.430 s
15/07/07 15:43:51 INFO DAGScheduler: looking for newly runnable stages
15/07/07 15:43:51 INFO DAGScheduler: running: Set()
15/07/07 15:43:51 INFO TaskSetManager: Finished task 1.0 in stage 1.0 (TID 1) in 413 ms on localhost (2/2)
15/07/07 15:43:51 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool 
15/07/07 15:43:51 INFO DAGScheduler: waiting: Set(Stage 0)
15/07/07 15:43:51 INFO DAGScheduler: failed: Set()
15/07/07 15:43:51 INFO DAGScheduler: Missing parents for Stage 0: List()
15/07/07 15:43:51 INFO DAGScheduler: Submitting Stage 0 (MappedRDD[22] at map at HiveContext.scala:360), which is now runnable
15/07/07 15:43:51 INFO StatsReportListener: Finished stage: org.apache.spark.scheduler.StageInfo@1deaf84d
15/07/07 15:43:51 INFO MemoryStore: ensureFreeSpace(9792) called with curMem=465382, maxMem=278019440
15/07/07 15:43:51 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 9.6 KB, free 264.7 MB)
15/07/07 15:43:51 INFO DAGScheduler: Submitting 1 missing tasks from Stage 0 (MappedRDD[22] at map at HiveContext.scala:360)
15/07/07 15:43:51 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
15/07/07 15:43:51 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 2, localhost, PROCESS_LOCAL, 948 bytes)
15/07/07 15:43:51 INFO Executor: Running task 0.0 in stage 0.0 (TID 2)
15/07/07 15:43:51 INFO BlockFetcherIterator$BasicBlockFetcherIterator: maxBytesInFlight: 50331648, targetRequestSize: 10066329
15/07/07 15:43:51 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Getting 2 non-empty blocks out of 2 blocks
15/07/07 15:43:51 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Started 0 remote fetches in 20 ms
15/07/07 15:43:51 INFO StatsReportListener: task runtime:(count: 2, mean: 407.000000, stdev: 6.000000, max: 413.000000, min: 401.000000)
15/07/07 15:43:51 INFO StatsReportListener:     0%  5%  10% 25% 50% 75% 90% 95% 100%
15/07/07 15:43:51 INFO StatsReportListener:     401.0 ms    401.0 ms    401.0 ms    401.0 ms    413.0 ms    413.0 ms    413.0 ms    413.0 ms    413.0 ms
15/07/07 15:43:51 INFO StatsReportListener: shuffle bytes written:(count: 2, mean: 50.000000, stdev: 0.000000, max: 50.000000, min: 50.000000)
15/07/07 15:43:51 INFO StatsReportListener:     0%  5%  10% 25% 50% 75% 90% 95% 100%
15/07/07 15:43:51 INFO StatsReportListener:     50.0 B  50.0 B  50.0 B  50.0 B  50.0 B  50.0 B  50.0 B  50.0 B  50.0 B
15/07/07 15:43:51 INFO StatsReportListener: task result size:(count: 2, mean: 1895.000000, stdev: 0.000000, max: 1895.000000, min: 1895.000000)
15/07/07 15:43:51 INFO StatsReportListener:     0%  5%  10% 25% 50% 75% 90% 95% 100%
15/07/07 15:43:51 INFO StatsReportListener:     1895.0 B    1895.0 B    1895.0 B    1895.0 B    1895.0 B    1895.0 B    1895.0 B    1895.0 B    1895.0 B
15/07/07 15:43:51 INFO StatsReportListener: executor (non-fetch) time pct: (count: 2, mean: 76.642534, stdev: 1.081437, max: 77.723971, min: 75.561097)
15/07/07 15:43:51 INFO StatsReportListener:     0%  5%  10% 25% 50% 75% 90% 95% 100%
15/07/07 15:43:51 INFO StatsReportListener:     76 %    76 %    76 %    76 %    78 %    78 %    78 %    78 %    78 %
15/07/07 15:43:51 INFO StatsReportListener: other time pct: (count: 2, mean: 23.357466, stdev: 1.081437, max: 24.438903, min: 22.276029)
15/07/07 15:43:51 INFO StatsReportListener:     0%  5%  10% 25% 50% 75% 90% 95% 100%
15/07/07 15:43:51 INFO StatsReportListener:     22 %    22 %    22 %    22 %    24 %    24 %    24 %    24 %    24 %
15/07/07 15:43:51 INFO Executor: Finished task 0.0 in stage 0.0 (TID 2). 1076 bytes result sent to driver
15/07/07 15:43:51 INFO DAGScheduler: Stage 0 (collect at HiveContext.scala:415) finished in 0.114 s
15/07/07 15:43:51 INFO StatsReportListener: Finished stage: org.apache.spark.scheduler.StageInfo@5ad48e86
15/07/07 15:43:51 INFO SparkContext: Job finished: collect at HiveContext.scala:415, took 0.778304976 s
15/07/07 15:43:51 INFO StatsReportListener: task runtime:(count: 1, mean: 116.000000, stdev: 0.000000, max: 116.000000, min: 116.000000)
15/07/07 15:43:51 INFO StatsReportListener:     0%  5%  10% 25% 50% 75% 90% 95% 100%
15/07/07 15:43:51 INFO StatsReportListener:     116.0 ms    116.0 ms    116.0 ms    116.0 ms    116.0 ms    116.0 ms    116.0 ms    116.0 ms    116.0 ms
15/07/07 15:43:51 INFO StatsReportListener: fetch wait time:(count: 1, mean: 0.000000, stdev: 0.000000, max: 0.000000, min: 0.000000)
15/07/07 15:43:51 INFO StatsReportListener:     0%  5%  10% 25% 50% 75% 90% 95% 100%
15/07/07 15:43:51 INFO StatsReportListener:     0.0 ms  0.0 ms  0.0 ms  0.0 ms  0.0 ms  0.0 ms  0.0 ms  0.0 ms  0.0 ms
15/07/07 15:43:51 INFO StatsReportListener: remote bytes read:(count: 1, mean: 0.000000, stdev: 0.000000, max: 0.000000, min: 0.000000)
15/07/07 15:43:51 INFO StatsReportListener:     0%  5%  10% 25% 50% 75% 90% 95% 100%
15/07/07 15:43:51 INFO StatsReportListener:     0.0 B   0.0 B   0.0 B   0.0 B   0.0 B   0.0 B   0.0 B   0.0 B   0.0 B
500
Time taken: 1.14 seconds
15/07/07 15:43:51 INFO CliDriver: Time taken: 1.14 seconds
spark-sql> 15/07/07 15:43:51 INFO StatsReportListener: task result size:(count: 1, mean: 1076.000000, stdev: 0.000000, max: 1076.000000, min: 1076.000000)
15/07/07 15:43:51 INFO StatsReportListener:     0%  5%  10% 25% 50% 75% 90% 95% 100%
15/07/07 15:43:51 INFO StatsReportListener:     1076.0 B    1076.0 B    1076.0 B    1076.0 B    1076.0 B    1076.0 B    1076.0 B    1076.0 B    1076.0 B
15/07/07 15:43:51 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 2) in 116 ms on localhost (1/1)
15/07/07 15:43:51 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
15/07/07 15:43:51 INFO StatsReportListener: executor (non-fetch) time pct: (count: 1, mean: 94.827586, stdev: 0.000000, max: 94.827586, min: 94.827586)
15/07/07 15:43:51 INFO StatsReportListener:     0%  5%  10% 25% 50% 75% 90% 95% 100%
15/07/07 15:43:51 INFO StatsReportListener:     95 %    95 %    95 %    95 %    95 %    95 %    95 %    95 %    95 %
15/07/07 15:43:51 INFO StatsReportListener: fetch wait time pct: (count: 1, mean: 0.000000, stdev: 0.000000, max: 0.000000, min: 0.000000)
15/07/07 15:43:51 INFO StatsReportListener:     0%  5%  10% 25% 50% 75% 90% 95% 100%
15/07/07 15:43:51 INFO StatsReportListener:      0 %     0 %     0 %     0 %     0 %     0 %     0 %     0 %     0 %
15/07/07 15:43:51 INFO StatsReportListener: other time pct: (count: 1, mean: 5.172414, stdev: 0.000000, max: 5.172414, min: 5.172414)
15/07/07 15:43:51 INFO StatsReportListener:     0%  5%  10% 25% 50% 75% 90% 95% 100%
15/07/07 15:43:51 INFO StatsReportListener:      5 %     5 %     5 %     5 %     5 %     5 %     5 %     5 %     5 %

ログが多くてわかりにくいですが、

500
Time taken: 1.14 seconds

となっていて500レコード入っているのが取得できました。

hiveと同じように使えてめっちゃ早いなぁ。。。

0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0