Help us understand the problem. What is going on with this article?

Apache SparkアプリをEclipseで動かすまでの手順

More than 5 years have passed since last update.

初めに

  1. 必要なソフトウェアのインストール
  2. giter8でプロジェクト作成
  3. sbtでビルド&テスト
  4. sbtのEclipseプラグインでEcplise設定ファイルを出力
  5. Eclipseにインポート

という流れでやっていきます
IntelliJ IDEAでもほぼ同じ手順で問題ないと思いますがIntelliJ IDEAを持っていないので試せていません

必要なソフトウェアのインストール

Apache Spark 1.2.1対応のscala 2.1.0をインストールします
sbtはビルドツールです
giter8はプロジェクトのテンプレートを作成してくれるツール(テンプレートはgithubで管理されてます)

brew cask install java
brew install scala210
brew link --force scala210
brew install sbt
brew install giter8

Eclipseプラグインのインストール

インストールの前にeclipse.iniを編集してheapを増やしておいたほうが良いです

~/Applications/Eclipse.app/Contents/MacOS/eclipse.ini
-Xms256m
-Xmx1024m

Eclipse Scala IDE

マーケットプレースからEclipse Scala IDEをインストール

ScalaTest for Scala IDE

新規ソフトウェアのインストールから以下をインストール
http://download.scala-ide.org/sdk/lithium/e44/scala211/stable/site
Scala IDE plugins -> ScalaTest for Scala IDE

giter8で新規プロジェクト作成

プロジェクトテンプレートはnttdata-oss/basic-spark-project.g8を利用
name、package、versionの入力を求められるので適当に入力

$ g8 nttdata-oss/basic-spark-project.g8
Picked up _JAVA_OPTIONS: -Dfile.encoding=UTF-8

A basic spark application project

name [Basic Spark]: SparkExample
package [com.example]: spark
version [0.0.1]:

Template applied in ./sparkexample

生成されたファイルの確認

ちょっとしたExampleが入っています

$ tree
.
├── README.rst
├── assembly.sbt
├── build.sbt
├── project
│   ├── assembly.sbt
│   └── plugins.sbt
└── src
    ├── main
    │   └── scala
    │       └── spark
    │           ├── GroupByTest.scala
    │           ├── RandomTextWriter.scala
    │           ├── SparkHdfsLR.scala
    │           ├── SparkLR.scala
    │           ├── SparkLRTestDataGenerator.scala
    │           ├── SparkPi.scala
    │           ├── WordCount.scala
    │           └── Words.scala
    └── test
        └── scala
            └── spark
                └── SparkPiSpec.scala

build.sbtの確認

spark-streaming、spark-sql、spark-hive、spark-mllibはコメントになっているので必要に応じて編集してください

build.sbt
name := "SparkExample"

organization := ""

version := "0.0.1"

scalaVersion := "2.10.4"

resolvers ++= Seq("cloudera" at "https://repository.cloudera.com/artifactory/cloudera-repos/")

libraryDependencies ++= Seq(
  "org.scalatest" %% "scalatest" % "2.0.M5b" % "test" withSources() withJavadoc(),
  "org.scalacheck" %% "scalacheck" % "1.10.0" % "test" withSources() withJavadoc(),
  "org.apache.spark" %% "spark-core" % "1.2.1" % "provided" withSources() withJavadoc(),
//  "org.apache.spark" %% "spark-streaming" % "1.2.1" % "provided" withSources() withJavadoc(),
//  "org.apache.spark" %% "spark-sql" % "1.2.1" % "provided" withSources() withJavadoc(),
//  "org.apache.spark" %% "spark-hive" % "1.2.1" % "provided" withSources() withJavadoc(),
//  "org.apache.spark" %% "spark-mllib" % "1.2.1" % "provided" withSources() withJavadoc(),
  "org.apache.hadoop" % "hadoop-client" % "2.5.0-cdh5.3.1" % "provided" withJavadoc(),
  "com.github.scopt" %% "scopt" % "3.2.0"
)


initialCommands := "import .sparkexample._"

sbtでビルド&テスト

初回は依存ライブラリのDLでとても時間がかかります

$ sbt test
Picked up _JAVA_OPTIONS: -Dfile.encoding=UTF-8
[info] Loading project definition from /Users/ishihamat/Documents/workspace/sparkexample/project
[info] Updating {file:/Users/ishihamat/Documents/workspace/sparkexample/project/}sparkexample-build...
[info] Resolving org.fusesource.jansi#jansi;1.4 ...
[info] downloading https://repo.scala-sbt.org/scalasbt/sbt-plugin-releases/com.eed3si9n/sbt-assembly/scala_2.10/sbt_0.13/0.11.1/jars/sbt-assembly.jar
(略)
[info] Compiling 1 Scala source to /Users/ishihamat/Documents/workspace/sparkexample/target/scala-2.10/test-classes...
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
15/03/14 22:47:23 INFO SecurityManager: Changing view acls to: ishihamat
15/03/14 22:47:23 INFO SecurityManager: Changing modify acls to: ishihamat
15/03/14 22:47:23 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(ishihamat); users
 with modify permissions: Set(ishihamat)
15/03/14 22:47:23 INFO Slf4jLogger: Slf4jLogger started
15/03/14 22:47:23 INFO Remoting: Starting remoting
15/03/14 22:47:23 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@192.168.11.2:53331]
15/03/14 22:47:23 INFO Utils: Successfully started service 'sparkDriver' on port 53331.
15/03/14 22:47:23 INFO SparkEnv: Registering MapOutputTracker
15/03/14 22:47:23 INFO SparkEnv: Registering BlockManagerMaster
15/03/14 22:47:23 INFO DiskBlockManager: Created local directory at /var/folders/mh/yw9p58bj0q56r3n50qn07tgh0000gn/T/spark-c7a664c4-0fa4-4ca5-a1cf-04d
c5d8853dd/spark-254de019-ab13-43af-ab11-6900d0584549
15/03/14 22:47:23 INFO MemoryStore: MemoryStore started with capacity 510.3 MB
15/03/14 22:47:24 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/03/14 22:47:24 INFO HttpFileServer: HTTP File server directory is /var/folders/mh/yw9p58bj0q56r3n50qn07tgh0000gn/T/spark-87116159-39a8-4dae-a875-c6
c9352dafdf/spark-4b21cfc1-d4fc-45ba-a2c9-2320197ff0ed
15/03/14 22:47:24 INFO HttpServer: Starting HTTP Server
15/03/14 22:47:24 INFO Utils: Successfully started service 'HTTP file server' on port 53332.
15/03/14 22:47:24 INFO Utils: Successfully started service 'SparkUI' on port 4040.
15/03/14 22:47:24 INFO SparkUI: Started SparkUI at http://192.168.11.2:4040
15/03/14 22:47:24 INFO Executor: Starting executor ID <driver> on host localhost
15/03/14 22:47:24 INFO AkkaUtils: Connecting to HeartbeatReceiver: akka.tcp://sparkDriver@192.168.11.2:53331/user/HeartbeatReceiver
15/03/14 22:47:24 INFO NettyBlockTransferService: Server created on 53333
15/03/14 22:47:24 INFO BlockManagerMaster: Trying to register BlockManager
15/03/14 22:47:24 INFO BlockManagerMasterActor: Registering block manager localhost:53333 with 510.3 MB RAM, BlockManagerId(<driver>, localhost, 53333
)
15/03/14 22:47:24 INFO BlockManagerMaster: Registered BlockManager
15/03/14 22:47:24 INFO SparkContext: Starting job: reduce at SparkPi.scala:50
15/03/14 22:47:24 INFO DAGScheduler: Got job 0 (reduce at SparkPi.scala:50) with 1 output partitions (allowLocal=false)
15/03/14 22:47:24 INFO DAGScheduler: Final stage: Stage 0(reduce at SparkPi.scala:50)
15/03/14 22:47:24 INFO DAGScheduler: Parents of final stage: List()
15/03/14 22:47:24 INFO DAGScheduler: Missing parents: List()
15/03/14 22:47:24 INFO DAGScheduler: Submitting Stage 0 (MappedRDD[1] at map at SparkPi.scala:46), which has no missing parents
15/03/14 22:47:24 INFO MemoryStore: ensureFreeSpace(1688) called with curMem=0, maxMem=535088332
15/03/14 22:47:24 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 1688.0 B, free 510.3 MB)
15/03/14 22:47:24 INFO MemoryStore: ensureFreeSpace(1228) called with curMem=1688, maxMem=535088332
15/03/14 22:47:25 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 1228.0 B, free 510.3 MB)
15/03/14 22:47:25 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:53333 (size: 1228.0 B, free: 510.3 MB)
15/03/14 22:47:25 INFO BlockManagerMaster: Updated info of block broadcast_0_piece0
15/03/14 22:47:25 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:838
15/03/14 22:47:25 INFO DAGScheduler: Submitting 1 missing tasks from Stage 0 (MappedRDD[1] at map at SparkPi.scala:46)
15/03/14 22:47:25 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
15/03/14 22:47:25 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, PROCESS_LOCAL, 1317 bytes)
15/03/14 22:47:25 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
15/03/14 22:47:25 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 727 bytes result sent to driver
15/03/14 22:47:25 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 57 ms on localhost (1/1)
15/03/14 22:47:25 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
15/03/14 22:47:25 INFO DAGScheduler: Stage 0 (reduce at SparkPi.scala:50) finished in 0.069 s
15/03/14 22:47:25 INFO DAGScheduler: Job 0 finished: reduce at SparkPi.scala:50, took 0.303240 s
[info] SparkPiSpec:
[info] Pi
[info] - should be less than 4 and more than 3
[info] Passed: Total 1, Failed 0, Errors 0, Passed 1
[success] Total time: 638 s, completed 2015/03/14 22:47:25

Eclipeの設定ファイルを出力

$ sbt eclipse
Picked up _JAVA_OPTIONS: -Dfile.encoding=UTF-8
[info] Loading project definition from /Users/ishihamat/Documents/workspace/sparkexample/project
[info] Set current project to SparkExample (in build file:/Users/ishihamat/Documents/workspace/sparkexample/)
[info] About to create Eclipse project files for your project(s).
[info] Successfully created Eclipse project files for project(s):
[info] SparkExample

ちなみにsbtのeclipseプラグイン設定がnttdata-oss/basic-spark-project.g8に入っていました。IntelliJ IDEAプラグインも入っていますね。

project/plugins.sbt
addSbtPlugin("com.typesafe.sbteclipse" % "sbteclipse-plugin" % "2.3.0")

addSbtPlugin("com.github.mpeltonen" % "sbt-idea" % "1.6.0")

Eclipseにインポート

Eclipseのメニューから

ファイル -> インポート -> 一般 -> 既存プロジェクトワークスペースへ

を選択しインポートします

私の環境では14件の問題が起きていましたがEclipseのScalaコンパイラバージョンを2.10に設定したら全て解決しました

あとはデバッグでScalaTestを選択し色々テストをしてください

ishihamat
Software Developer
lightcafe_gr
全国にグループ会社を持つIT企業です
https://www.lightcafe.co.jp/
Why not register and get more from Qiita?
  1. We will deliver articles that match you
    By following users and tags, you can catch up information on technical fields that you are interested in as a whole
  2. you can read useful information later efficiently
    By "stocking" the articles you like, you can search right away