目的
Win10のWSL(Ubuntu)にpysparkをpipでインストールする。
環境
Software | Version |
---|---|
OS | Windows 10 Pro |
WSL | Ubuntu 18.04.3 LTS |
1. JDK 8をインストール
$ sudo apt install openjdk-8-jre-headless
(インストールされる)
$ java -version
openjdk version "1.8.0_242"
OpenJDK Runtime Environment (build 1.8.0_242-8u242-b08-0ubuntu3~18.04-b08)
OpenJDK 64-Bit Server VM (build 25.242-b08, mixed mode)
2. pipでpysparkをインストール
$ python3 -m pip install pyspark
3. Pysparkを起動
$ export PYSPARK_PYTHON=python3
$ export SPARK_HOME=~/.local/lib/python3.6/site-packages/pyspark
$ export PATH=$PATH:$SPARK_HOME/bin
$ pyspark
Python 3.6.9 (default, Nov 7 2019, 10:44:02)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
20/02/10 01:22:47 WARN Utils: Your hostname, DESKTOP-VOJ1TQH resolves to a loopback address: 127.0.1.1; using 192.168.181.17 instead (on interface eth1)
20/02/10 01:22:47 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
20/02/10 01:22:47 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 2.4.5
/_/
Using Python version 3.6.9 (default, Nov 7 2019 10:44:02)
SparkSession available as 'spark'.
>>>
4. .bashrcに追記しておく
# Pyspark
export PYSPARK_PYTHON=python3
export SPARK_HOME=~/.local/lib/python3.6/site-packages/pyspark
export PATH=$PATH:$SPARK_HOME/bin
$ . ~/.bashrc