Apache SparkアプリをEclipseで動かすまでの手順

Last updated at Posted at 2015-03-14


  1. 必要なソフトウェアのインストール
  2. giter8でプロジェクト作成
  3. sbtでビルド&テスト
  4. sbtのEclipseプラグインでEcplise設定ファイルを出力
  5. Eclipseにインポート

IntelliJ IDEAでもほぼ同じ手順で問題ないと思いますがIntelliJ IDEAを持っていないので試せていません


Apache Spark 1.2.1対応のscala 2.1.0をインストールします

brew cask install java
brew install scala210
brew link --force scala210
brew install sbt
brew install giter8




Eclipse Scala IDE

マーケットプレースからEclipse Scala IDEをインストール

ScalaTest for Scala IDE

Scala IDE plugins -> ScalaTest for Scala IDE



$ g8 nttdata-oss/basic-spark-project.g8
Picked up _JAVA_OPTIONS: -Dfile.encoding=UTF-8

A basic spark application project

name [Basic Spark]: SparkExample
package [com.example]: spark
version [0.0.1]:

Template applied in ./sparkexample



$ tree
├── README.rst
├── assembly.sbt
├── build.sbt
├── project
│   ├── assembly.sbt
│   └── plugins.sbt
└── src
    ├── main
    │   └── scala
    │       └── spark
    │           ├── GroupByTest.scala
    │           ├── RandomTextWriter.scala
    │           ├── SparkHdfsLR.scala
    │           ├── SparkLR.scala
    │           ├── SparkLRTestDataGenerator.scala
    │           ├── SparkPi.scala
    │           ├── WordCount.scala
    │           └── Words.scala
    └── test
        └── scala
            └── spark
                └── SparkPiSpec.scala



name := "SparkExample"

organization := ""

version := "0.0.1"

scalaVersion := "2.10.4"

resolvers ++= Seq("cloudera" at "https://repository.cloudera.com/artifactory/cloudera-repos/")

libraryDependencies ++= Seq(
  "org.scalatest" %% "scalatest" % "2.0.M5b" % "test" withSources() withJavadoc(),
  "org.scalacheck" %% "scalacheck" % "1.10.0" % "test" withSources() withJavadoc(),
  "org.apache.spark" %% "spark-core" % "1.2.1" % "provided" withSources() withJavadoc(),
//  "org.apache.spark" %% "spark-streaming" % "1.2.1" % "provided" withSources() withJavadoc(),
//  "org.apache.spark" %% "spark-sql" % "1.2.1" % "provided" withSources() withJavadoc(),
//  "org.apache.spark" %% "spark-hive" % "1.2.1" % "provided" withSources() withJavadoc(),
//  "org.apache.spark" %% "spark-mllib" % "1.2.1" % "provided" withSources() withJavadoc(),
  "org.apache.hadoop" % "hadoop-client" % "2.5.0-cdh5.3.1" % "provided" withJavadoc(),
  "com.github.scopt" %% "scopt" % "3.2.0"

initialCommands := "import .sparkexample._"



$ sbt test
Picked up _JAVA_OPTIONS: -Dfile.encoding=UTF-8
[info] Loading project definition from /Users/ishihamat/Documents/workspace/sparkexample/project
[info] Updating {file:/Users/ishihamat/Documents/workspace/sparkexample/project/}sparkexample-build...
[info] Resolving org.fusesource.jansi#jansi;1.4 ...
[info] downloading https://repo.scala-sbt.org/scalasbt/sbt-plugin-releases/com.eed3si9n/sbt-assembly/scala_2.10/sbt_0.13/0.11.1/jars/sbt-assembly.jar
[info] Compiling 1 Scala source to /Users/ishihamat/Documents/workspace/sparkexample/target/scala-2.10/test-classes...
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
15/03/14 22:47:23 INFO SecurityManager: Changing view acls to: ishihamat
15/03/14 22:47:23 INFO SecurityManager: Changing modify acls to: ishihamat
15/03/14 22:47:23 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(ishihamat); users
 with modify permissions: Set(ishihamat)
15/03/14 22:47:23 INFO Slf4jLogger: Slf4jLogger started
15/03/14 22:47:23 INFO Remoting: Starting remoting
15/03/14 22:47:23 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@]
15/03/14 22:47:23 INFO Utils: Successfully started service 'sparkDriver' on port 53331.
15/03/14 22:47:23 INFO SparkEnv: Registering MapOutputTracker
15/03/14 22:47:23 INFO SparkEnv: Registering BlockManagerMaster
15/03/14 22:47:23 INFO DiskBlockManager: Created local directory at /var/folders/mh/yw9p58bj0q56r3n50qn07tgh0000gn/T/spark-c7a664c4-0fa4-4ca5-a1cf-04d
15/03/14 22:47:23 INFO MemoryStore: MemoryStore started with capacity 510.3 MB
15/03/14 22:47:24 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/03/14 22:47:24 INFO HttpFileServer: HTTP File server directory is /var/folders/mh/yw9p58bj0q56r3n50qn07tgh0000gn/T/spark-87116159-39a8-4dae-a875-c6
15/03/14 22:47:24 INFO HttpServer: Starting HTTP Server
15/03/14 22:47:24 INFO Utils: Successfully started service 'HTTP file server' on port 53332.
15/03/14 22:47:24 INFO Utils: Successfully started service 'SparkUI' on port 4040.
15/03/14 22:47:24 INFO SparkUI: Started SparkUI at
15/03/14 22:47:24 INFO Executor: Starting executor ID <driver> on host localhost
15/03/14 22:47:24 INFO AkkaUtils: Connecting to HeartbeatReceiver: akka.tcp://sparkDriver@
15/03/14 22:47:24 INFO NettyBlockTransferService: Server created on 53333
15/03/14 22:47:24 INFO BlockManagerMaster: Trying to register BlockManager
15/03/14 22:47:24 INFO BlockManagerMasterActor: Registering block manager localhost:53333 with 510.3 MB RAM, BlockManagerId(<driver>, localhost, 53333
15/03/14 22:47:24 INFO BlockManagerMaster: Registered BlockManager
15/03/14 22:47:24 INFO SparkContext: Starting job: reduce at SparkPi.scala:50
15/03/14 22:47:24 INFO DAGScheduler: Got job 0 (reduce at SparkPi.scala:50) with 1 output partitions (allowLocal=false)
15/03/14 22:47:24 INFO DAGScheduler: Final stage: Stage 0(reduce at SparkPi.scala:50)
15/03/14 22:47:24 INFO DAGScheduler: Parents of final stage: List()
15/03/14 22:47:24 INFO DAGScheduler: Missing parents: List()
15/03/14 22:47:24 INFO DAGScheduler: Submitting Stage 0 (MappedRDD[1] at map at SparkPi.scala:46), which has no missing parents
15/03/14 22:47:24 INFO MemoryStore: ensureFreeSpace(1688) called with curMem=0, maxMem=535088332
15/03/14 22:47:24 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 1688.0 B, free 510.3 MB)
15/03/14 22:47:24 INFO MemoryStore: ensureFreeSpace(1228) called with curMem=1688, maxMem=535088332
15/03/14 22:47:25 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 1228.0 B, free 510.3 MB)
15/03/14 22:47:25 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:53333 (size: 1228.0 B, free: 510.3 MB)
15/03/14 22:47:25 INFO BlockManagerMaster: Updated info of block broadcast_0_piece0
15/03/14 22:47:25 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:838
15/03/14 22:47:25 INFO DAGScheduler: Submitting 1 missing tasks from Stage 0 (MappedRDD[1] at map at SparkPi.scala:46)
15/03/14 22:47:25 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
15/03/14 22:47:25 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, PROCESS_LOCAL, 1317 bytes)
15/03/14 22:47:25 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
15/03/14 22:47:25 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 727 bytes result sent to driver
15/03/14 22:47:25 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 57 ms on localhost (1/1)
15/03/14 22:47:25 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
15/03/14 22:47:25 INFO DAGScheduler: Stage 0 (reduce at SparkPi.scala:50) finished in 0.069 s
15/03/14 22:47:25 INFO DAGScheduler: Job 0 finished: reduce at SparkPi.scala:50, took 0.303240 s
[info] SparkPiSpec:
[info] Pi
[info] - should be less than 4 and more than 3
[info] Passed: Total 1, Failed 0, Errors 0, Passed 1
[success] Total time: 638 s, completed 2015/03/14 22:47:25


$ sbt eclipse
Picked up _JAVA_OPTIONS: -Dfile.encoding=UTF-8
[info] Loading project definition from /Users/ishihamat/Documents/workspace/sparkexample/project
[info] Set current project to SparkExample (in build file:/Users/ishihamat/Documents/workspace/sparkexample/)
[info] About to create Eclipse project files for your project(s).
[info] Successfully created Eclipse project files for project(s):
[info] SparkExample

ちなみにsbtのeclipseプラグイン設定がnttdata-oss/basic-spark-project.g8に入っていました。IntelliJ IDEAプラグインも入っていますね。

addSbtPlugin("com.typesafe.sbteclipse" % "sbteclipse-plugin" % "2.3.0")

addSbtPlugin("com.github.mpeltonen" % "sbt-idea" % "1.6.0")



ファイル -> インポート -> 一般 -> 既存プロジェクトワークスペースへ





