Help us understand the problem. What is going on with this article?

gce に spark を インストールする、もちろんubuntu環境で。

More than 3 years have passed since last update.

google compute engineにsparkをインストールする

gceにクイックデプロイではなく、新しいインスタンスを作成してsparkをインストールする。

前提

  • ローカル環境はubuntu
  • gcloudはセットアップ済み

手順(インスタンス作成)

1.VMインスタンスの画面で[新しいインスタンス]ボタンを押す
image

2.マシンタイプ等を適当に選択、ブートディスクはもちろんubuntuに。バージョンはお好みで。
image

3.作成したインスタンスのgcloudで接続を選択
image

4.出てきたコマンドラインをローカルのubuntuのtermに貼り付ける
image

これでローカルから新しく作ったgceのubuntuインスタンスに接続できる。
以降の手順等はgcloudで接続したVM側で実行する。

手順(インストール)

1.Java8をインストール
素のubuntu VMはJavaが入ってなかったのでインストールします。

$ sudo add-apt-repository ppa:webupd8team/java
$ sudo apt-get update
$ sudo apt-get install oracle-java8-installer

途中で[ENTER]とか[Y]とか[OK]とか[yes]とか選んでインストールする。
インストールが終わったらとりあえず確認してみる。

junk@instance-2:~$ java -version
java version "1.8.0_45"
Java(TM) SE Runtime Environment (build 1.8.0_45-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode)
junk@instance-2:~$ 

OKっぽい。

2.scalaをダウンロード

$ cd ~
$ mkdir dl
$ cd dl
$ wget http://www.scala-lang.org/files/archive/scala-2.11.7.tgz
--2015-07-06 16:04:20--  http://www.scala-lang.org/files/archive/scala-2.11.7.tgz
Resolving www.scala-lang.org (www.scala-lang.org)... 128.178.154.159
Connecting to www.scala-lang.org (www.scala-lang.org)|128.178.154.159|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 28460530 (27M) [application/x-gzip]
Saving to: ‘scala-2.11.7.tgz’

scala-2.11.7.tgz                100%[======================================================>]  27.14M  5.57MB/s   in 8.3s   

2015-07-06 16:04:29 (3.27 MB/s) - ‘scala-2.11.7.tgz’ saved [28460530/28460530]

3.解凍する

tar -xzvf scala-2.11.7.tgz

4.解凍したscalaをコピーしてリンクも作ってあげる。

$ cd /usr/local/
$ sudo cp -r ~/dl/scala-2.11.7 .
$ sudo ln -sv scala-2.11.7/ scala
‘scala’ -> ‘scala-2.11.7/’

5.sparkをダウンロード

$ cd ~/dl
$ wget http://archive.apache.org/dist/spark/spark-1.4.0/spark-1.4.0-bin-hadoop2.6.tgz
--2015-07-06 16:11:16--  http://archive.apache.org/dist/spark/spark-1.4.0/spark-1.4.0-bin-hadoop2.6.tgz
Resolving archive.apache.org (archive.apache.org)... 192.87.106.229, 140.211.11.131, 2001:610:1:80bc:192:87:106:229
Connecting to archive.apache.org (archive.apache.org)|192.87.106.229|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 250194134 (239M) [application/x-tar]
Saving to: ‘spark-1.4.0-bin-hadoop2.6.tgz’

spark-1.4.0-bin-hadoop2.6.tgz   100%[======================================================>] 238.60M  6.62MB/s   in 45s    

2015-07-06 16:12:02 (5.32 MB/s) - ‘spark-1.4.0-bin-hadoop2.6.tgz’ saved [250194134/250194134]

6.解凍する

$ tar -xzvf spark-1.4.0-bin-hadoop2.6.tgz

7.解凍したsparkをコピーしてリンクも(ry

$ cd /usr/local/
$ sudo cp -r ~/dl/spark-1.4.0-bin-hadoop2.6 .
$ sudo ln -sv spark-1.4.0-bin-hadoop2.6/ spark
‘spark’ -> ‘spark-1.4.0-bin-hadoop2.6/’

8.パスを設定する

$ vi ~/.bashrc  

.bashrcの最後に以下を追加

export SCALA_HOME=/usr/local/scala
export SPARK_HOME=/usr/local/spark
export PATH=$SCALA_HOME/bin:$PATH

読み込み直し

$ source ~/.bashrc

9.起動

$ cd $SPARK_HOME
$ ./bin/spark-shell
log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
15/07/06 16:24:33 INFO SecurityManager: Changing view acls to: junk
15/07/06 16:24:33 INFO SecurityManager: Changing modify acls to: junk
15/07/06 16:24:33 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(junk); users with modify permissions: Set(junk)
15/07/06 16:24:33 INFO HttpServer: Starting HTTP Server
15/07/06 16:24:33 INFO Utils: Successfully started service 'HTTP class server' on port 45846.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 1.4.0
      /_/

Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_45)
Type in expressions to have them evaluated.
Type :help for more information.
15/07/06 16:24:38 INFO SparkContext: Running Spark version 1.4.0
15/07/06 16:24:38 INFO SecurityManager: Changing view acls to: junk
15/07/06 16:24:38 INFO SecurityManager: Changing modify acls to: junk
15/07/06 16:24:38 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(junk); users with modify permissions: Set(junk)
15/07/06 16:24:39 INFO Slf4jLogger: Slf4jLogger started
15/07/06 16:24:39 INFO Remoting: Starting remoting
Mon Jul 06 16:24:42 UTC 2015 Thread[main,5,main] java.io.FileNotFoundException: derby.log (Permission denied)
15/07/06 16:24:43 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
----------------------------------------------------------------
Loaded from file:/usr/local/spark-1.4.0-bin-hadoop2.6/lib/spark-assembly-1.4.0-hadoop2.6.0.jar
java.vendor=Oracle Corporation
java.runtime.version=1.8.0_45-b14
user.dir=/usr/local/spark-1.4.0-bin-hadoop2.6
os.name=Linux
os.arch=amd64
os.version=3.19.0-21-generic
derby.system.home=null
Database Class Loader started - derby.database.classpath=''
15/07/06 16:24:45 INFO ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
15/07/06 16:24:45 INFO MetaStoreDirectSql: MySQL check failed, assuming we are not on mysql: Lexical error at line 1, column 5.  Encountered: "@" (64), after : "".
15/07/06 16:24:46 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
15/07/06 16:24:46 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
15/07/06 16:24:47 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
15/07/06 16:24:47 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
15/07/06 16:24:47 INFO ObjectStore: Initialized ObjectStore
15/07/06 16:24:48 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 0.13.1aa
15/07/06 16:24:48 INFO HiveMetaStore: Added admin role in metastore
15/07/06 16:24:48 INFO HiveMetaStore: Added public role in metastore
15/07/06 16:24:48 INFO HiveMetaStore: No user is added in admin role, since config is empty
15/07/06 16:24:48 INFO SessionState: No Tez session required at this point. hive.execution.engine=mr.
15/07/06 16:24:48 INFO SparkILoop: Created sql context (with Hive support)..
SQL context available as sqlContext.

scala> 

最後のログは長いので途中略しました

ここまで10分くらいで出来るっぽいです。

Why not register and get more from Qiita?
  1. We will deliver articles that match you
    By following users and tags, you can catch up information on technical fields that you are interested in as a whole
  2. you can read useful information later efficiently
    By "stocking" the articles you like, you can search right away
Comments
Sign up for free and join this conversation.
If you already have a Qiita account
Why do not you register as a user and use Qiita more conveniently?
You need to log in to use this function. Qiita can be used more conveniently after logging in.
You seem to be reading articles frequently this month. Qiita can be used more conveniently after logging in.
  1. We will deliver articles that match you
    By following users and tags, you can catch up information on technical fields that you are interested in as a whole
  2. you can read useful information later efficiently
    By "stocking" the articles you like, you can search right away