5
5

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 5 years have passed since last update.

Scala版Sparkセットアップ

Last updated at Posted at 2015-07-06

忘備録

【OS】
今回はCentOS6.6_x86_64版を使用。詳細は以下を参照。
http://centos.server-manual.com/
事前準備
セットアップに必要なパッケージを事前に設定しておく必要がある。以下を全て設定する。
システム変更が発生するので管理者権限が必須。rootにsuしておく事。

【YUMパッケージ管理】
yum -y install yum-plugin-fastestmirror
yum -y update
yum -y groupinstall "Base" "Development tools" "Japanese Support"
[RPMforgeリポジトリ追加]
rpm --import http://apt.sw.be/RPM-GPG-KEY.dag.txt
rpm -ivh http://apt.sw.be/redhat/el6/en/x86_64/rpmforge/RPMS/rpmforge-release-0.5.3-1.el6.rf.x86_64.rpm
[EPELリポジトリ追加]
rpm --import http://dl.fedoraproject.org/pub/epel/RPM-GPG-KEY-EPEL-6
rpm -ivh http://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm
[ELRepoリポジトリ追加]
rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org
rpm -Uvh http://www.elrepo.org/elrepo-release-6-6.el6.elrepo.noarch.rpm
[Remiリポジトリ追加]
rpm --import http://rpms.famillecollet.com/RPM-GPG-KEY-remi
rpm -ivh http://rpms.famillecollet.com/enterprise/remi-release-6.rpm

【SELinux無効化】
getenforce
Enforcing ←SELinux有効
setenforce 0
getenforce
Permissive ←SELinux無効
vi /etc/sysconfig/selinux
SELINUX=enforcing
SELINUX=disabled ←変更(起動時に無効にする)

【iptablesでHTTPを許可】
vi /etc/sysconfig/iptables
-A INPUT -m state --state NEW -m tcp -p tcp --dport 22 -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 80 -j ACCEPT ←追加
-A INPUT -j REJECT --reject-with icmp-host-prohibited
-A FORWARD -j REJECT --reject-with icmp-host-prohibited
COMMIT
Iptables再起動
service iptables restart

【JAVA】
CentOS構築時にデフォルトでインストールされたバージョンをアンインストール。
yum erase java*
最新版をネットより入手(rpm版)しインストール
rpm –ivh jdk-8u45-linux-x64.rpm
バージョン確認
java –version
java version "1.8.0_45"
Java(TM) SE Runtime Environment (build 1.8.0_45-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode)
■JAVA_HOME設定
vi /etc/profile


export JAVA_HOME=/usr/java/default
export PATH=$PATH:$JAVA_HOME/bin
export CLASSPATH=.:$JAVA_HOME/jre/lib:$JAVA_HOME/lib:$JAVA_HOME/lib/tools.jar

【前提条件】
Standalone modeで稼働させるため今回はビルドは行わないものとします

【Scala】
cd /usr/local/src
wget http://www.scala-lang.org/files/archive/scala-2.11.7.tgz
tar -zxvf scala-2.11.7.tgz
chown -R root:root scala-2.11.7
mv scala-2.11.7 ../scala

【Spark】
wget http://ftp.riken.jp/net/apache/spark/spark-1.4.0/spark-1.4.0-bin-cdh4.tgz
tar -zxvf spark-1.4.0-bin-cdh4.tgz
chown -R root:root spark-1.4.0-bin-cdh4
mv spark-1.4.0-bin-cdh4 ../spark

環境変数を追記
vi /etc/profile


export SCALA_HOME=/usr/local/scala
export SPARK_HOME=/usr/local/spark
export PATH=$SCALA_HOME/bin:$PATH

source /etc/profile

確認


cd $SPARK_HOME
./bin/spark-shell
    Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
    14/10/01 05:53:08 INFO SecurityManager: Changing view acls to: hdspark,
    14/10/01 05:53:08 INFO SecurityManager: Changing modify acls to: hdspark,
    14/10/01 05:53:08 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hdspark, ); users with modify permissions: Set(hdspark, )
    14/10/01 05:53:08 INFO HttpServer: Starting HTTP Server
    14/10/01 05:53:09 INFO Utils: Successfully started service 'HTTP class server' on port 33066.
	Welcome to
	      ____              __
	     / __/__  ___ _____/ /__
	    _\ \/ _ \/ _ `/ __/  '_/
	   /___/ .__/\_,_/_/ /_/\_\   version 1.4.0
	      /_/

    Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_45)
    Type in expressions to have them evaluated.
    Type :help for more information.
    14/10/01 05:53:22 INFO SecurityManager: Changing view acls to: hdspark,
    14/10/01 05:53:22 INFO SecurityManager: Changing modify acls to: hdspark,
    14/10/01 05:53:22 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hdspark, ); users with modify permissions: Set(hdspark, )
    14/10/01 05:53:24 INFO Slf4jLogger: Slf4jLogger started
    14/10/01 05:53:24 INFO Remoting: Starting remoting
    14/10/01 05:53:25 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@localhost:36288]
    14/10/01 05:53:25 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sparkDriver@localhost:36288]
    14/10/01 05:53:25 INFO Utils: Successfully started service 'sparkDriver' on port 36288.
    14/10/01 05:53:25 INFO SparkEnv: Registering MapOutputTracker
    14/10/01 05:53:25 INFO SparkEnv: Registering BlockManagerMaster
    14/10/01 05:53:25 INFO DiskBlockManager: Created local directory at /tmp/spark-local-20141001055325-22ac
    14/10/01 05:53:26 INFO Utils: Successfully started service 'Connection manager for block manager' on port 56196.
    14/10/01 05:53:26 INFO ConnectionManager: Bound socket to port 56196 with id = ConnectionManagerId(localhost,56196)
    14/10/01 05:53:26 INFO MemoryStore: MemoryStore started with capacity 267.3 MB
    14/10/01 05:53:26 INFO BlockManagerMaster: Trying to register BlockManager
    14/10/01 05:53:26 INFO BlockManagerMasterActor: Registering block manager localhost:56196 with 267.3 MB RAM
    14/10/01 05:53:26 INFO BlockManagerMaster: Registered BlockManager
    14/10/01 05:53:26 INFO HttpFileServer: HTTP File server directory is /tmp/spark-a33f43d9-37da-4c9e-a0b8-71b117b37012
    14/10/01 05:53:26 INFO HttpServer: Starting HTTP Server
    14/10/01 05:53:26 INFO Utils: Successfully started service 'HTTP file server' on port 54714.
    14/10/01 05:53:27 INFO Utils: Successfully started service 'SparkUI' on port 4040.
    14/10/01 05:53:27 INFO SparkUI: Started SparkUI at http://localhost:4040
    14/10/01 05:53:29 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    14/10/01 05:53:29 INFO Executor: Using REPL class URI: http://localhost:33066
    14/10/01 05:53:29 INFO AkkaUtils: Connecting to HeartbeatReceiver: akka.tcp://sparkDriver@localhost:36288/user/HeartbeatReceiver
    14/10/01 05:53:30 INFO SparkILoop: Created spark context..
    Spark context available as sc.
scala>

//簡単な行数カウントを実行してみます

scala> val txtFile = sc.textFile("README.md")
	14/10/01 05:56:17 WARN SizeEstimator: Failed to check whether UseCompressedOops is set; assuming yes
	14/10/01 05:56:17 INFO MemoryStore: ensureFreeSpace(156973) called with curMem=0, maxMem=280248975
	14/10/01 05:56:17 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 153.3 KB, free 267.1 MB)
	txtFile: org.apache.spark.rdd.RDD[String] = ../README.md MappedRDD[1] at textFile at <console>:12


scala> txtFile.count()
	14/10/01 05:56:29 INFO FileInputFormat: Total input paths to process : 1
	14/10/01 05:56:29 INFO SparkContext: Starting job: count at <console>:15
	14/10/01 05:56:29 INFO DAGScheduler: Got job 0 (count at <console>:15) with 1 output partitions (allowLocal=false)
	14/10/01 05:56:29 INFO DAGScheduler: Final stage: Stage 0(count at <console>:15)
	14/10/01 05:56:29 INFO DAGScheduler: Parents of final stage: List()
	14/10/01 05:56:29 INFO DAGScheduler: Missing parents: List()
	14/10/01 05:56:29 INFO DAGScheduler: Submitting Stage 0 (../README.md MappedRDD[1] at textFile at <console>:12), which has no missing parents
	14/10/01 05:56:29 INFO MemoryStore: ensureFreeSpace(2384) called with curMem=156973, maxMem=280248975
	14/10/01 05:56:29 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 2.3 KB, free 267.1 MB)
	14/10/01 05:56:29 INFO DAGScheduler: Submitting 1 missing tasks from Stage 0 (../README.md MappedRDD[1] at textFile at <console>:12)
	14/10/01 05:56:29 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
	14/10/01 05:56:29 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, PROCESS_LOCAL, 1207 bytes)
	14/10/01 05:56:29 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
	14/10/01 05:56:29 INFO HadoopRDD: Input split: file:/usr/local/spark/README.md:0+4811
	14/10/01 05:56:29 INFO deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id
	14/10/01 05:56:29 INFO deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
	14/10/01 05:56:29 INFO deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap
	14/10/01 05:56:29 INFO deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
	14/10/01 05:56:29 INFO deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id
	14/10/01 05:56:30 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 1731 bytes result sent to driver
	14/10/01 05:56:30 INFO DAGScheduler: Stage 0 (count at <console>:15) finished in 0.462 s
	14/10/01 05:56:30 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 423 ms on localhost (1/1)
	14/10/01 05:56:30 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
	14/10/01 05:56:30 INFO SparkContext: Job finished: count at <console>:15, took 0.828128221 s
	res0: Long = 141
//成功!
5
5
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
5
5

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?