IBM Data Scientist Experience の Python2 with Spark1.6にて実行
(1)クレデンシャル(資格情報)を定義
あらかじめDSXに自分のdashDBを登録しておいたので、"insert to code"をクリックして設定。(ユーザー名とパスワードは伏字にして..ます)
credentials_2 = {
'port':'50000',
'db':'BLUDB',
'username':'dashXXXXX',
'ssljdbcurl':'jdbc:db2://dashdb-entry-yp-dal09-07.services.dal.bluemix.net:50001/BLUDB:sslConnection=true;',
'host':'dashdb-entry-yp-dal09-07.services.dal.bluemix.net',
'https_url':'https://dashdb-entry-yp-dal09-07.services.dal.bluemix.net:8443',
'dsn':'DATABASE=BLUDB;HOSTNAME=dashdb-entry-yp-dal09-07.services.dal.bluemix.net;PORT=50000;PROTOCOL=TCPIP;UID=dashXXXX;PWD=XXXXXXXXXXX;',
'hostname':'dashdb-entry-yp-dal09-07.services.dal.bluemix.net',
'jdbcurl':'jdbc:db2://dashdb-entry-yp-dal09-07.services.dal.bluemix.net:50000/BLUDB',
'ssldsn':'DATABASE=BLUDB;HOSTNAME=dashdb-entry-yp-dal09-07.services.dal.bluemix.net;PORT=50001;PROTOCOL=TCPIP;UID=dash7836;PWD=82f9d02e61da;Security=SSL;',
'uri':'db2://dashXXXX:XXXXXXXXXXX@dashdb-entry-yp-dal09-07.services.dal.bluemix.net:50000/BLUDB',
'password':"""XXXXXXXXXXXXX"""
}
(2) データ取得用の関数を定義
参考にしたサイト ---> http://stackoverflow.com/questions/37688993/how-to-use-pandas-on-spark-notebook-data-on-dashdb-in-python
def getDashData(credentials,schemaName , tableName):
from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)
props = {}
props['user'] = credentials['username']
props['password'] = credentials['password']
table = schemaName + '.' + tableName
return sqlContext.read.jdbc(credentials['jdbcurl'],table,properties=props)
(3) dashDBからデータを取り出し& 先頭10レコードを確認
df_dash = getDashData(credentials_2 , 'DASH7836', 'TEST1' )
df_dash.toPandas().head(10)
