Windows ではすんなりできたのに...
Mac で pyodbc つかって Amazon Redshift にアクセスしようとしたら色々はまったのでメモ.
前準備
- pyodbc のインストール
>> pip install pyodbc
- Redshift の odbcドライバ のインストール
import できない
>> import pyodbc
Traceback (most recent call last):
File "/Users/napinoco/.pyenv/versions/anaconda3-x.x.x/lib/python3.x/site-packages/IPython/core/interactiveshell.py", line 2881, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-2-5ecb2b5e392f>", line 1, in <module>
import pyodbc
File "/Applications/PyCharm.app/Contents/helpers/pydev/_pydev_bundle/pydev_import_hook.py", line 20, in do_import
module = self._system_import(name, *args, **kwargs)
ImportError: dlopen(/Users/napinoco/.pyenv/versions/anaconda3-x.x.x/lib/python3.x/site-packages/pyodbc.cpython-36m-darwin.so, 2): Library not loaded: /usr/local/opt/unixodbc/lib/libodbc.2.dylib
Referenced from: /Users/napinoco/.pyenv/versions/anaconda3-x.x.x/lib/python3.x/site-packages/pyodbc.cpython-36m-darwin.so
Reason: image not found
-> unixodbcをインストールする
>> brew install unixodbc
ドライバが見つからない
>> server = XXX
>> database = YYY
>> username = ZZZ
>> password = WWW
>> cnxn = pyodbc.connect('DRIVER={Amazon Redshift (x64)};SERVER=%s;DATABASE=%s;UID=%s;PWD=%s' % (server, database, username, password))
Traceback (most recent call last):
File "/Users/napinoco/.pyenv/versions/anaconda3-x.x.x/lib/python3.x/site-packages/IPython/core/interactiveshell.py", line 2881, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-56-da8a2331fbb5>", line 1, in <module>
cnxn = pyodbc.connect('DRIVER={Amazon Redshift (x64)};SERVER=%s;DATABASE=%s;UID=%s;PWD=%s' % (server, database, username, password))
pyodbc.Error: ('01000', "[01000] [unixODBC][Driver Manager]Can't open lib 'Amazon Redshift (x86)' : file not found (0) (SQLDriverConnect)")
-> Redshiftのページでは設定ファイルをいじるよう書いてあるが、プログラムで直接パスを指定したほうが楽
>> cnxn = pyodbc.connect('DRIVER={/opt/amazon/redshift/lib/libamazonredshiftodbc.dylib};SERVER=%s;DATABASE=%s;UID=%s;PWD=%s' % (server, database, username, password))
クエリが文字化けして投げられない
>> import pandas as pd
>> test_sql = 'select * from hogehoge limit 100;'
>> test_df = pd.read_sql(test_sql, cnxn)
Traceback (most recent call last):
File "/Users/napinoco/.pyenv/versions/anaconda3-x.x.x/lib/python3.x/site-packages/pandas/io/sql.py", line 1401, in execute
cur.execute(*args)
pyodbc.ProgrammingError: ('42601', '[42601] [Amazon][Amazon Redshift] (30) Error occurred while trying to execute a query: [SQLState 42601] ERROR: syntax error at or near "���������������������������������"\nLINE 1: ���������������������������������;\n ^\n (30) (SQLExecDirectW)')
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/napinoco/.pyenv/versions/anaconda3-x.x.x/lib/python3.x/site-packages/IPython/core/interactiveshell.py", line 2881, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-59-74a1b8cc7d1c>", line 1, in <module>
test_df = pd.read_sql(test_sql, cnxn)
File "/Users/napinoco/.pyenv/versions/anaconda3-x.x.x/lib/python3.x/site-packages/pandas/io/sql.py", line 399, in read_sql
chunksize=chunksize)
File "/Users/napinoco/.pyenv/versions/anaconda3-x.x.x/lib/python3.x/site-packages/pandas/io/sql.py", line 1436, in read_query
cursor = self.execute(*args)
File "/Users/napinoco/.pyenv/versions/anaconda3-x.x.x/lib/python3.x/site-packages/pandas/io/sql.py", line 1413, in execute
raise_with_traceback(ex)
File "/Users/napinoco/.pyenv/versions/anaconda3-x.x.x/lib/python3.x/site-packages/pandas/compat/__init__.py", line 340, in raise_with_traceback
raise exc.with_traceback(traceback)
File "/Users/napinoco/.pyenv/versions/anaconda3-x.x.x/lib/python3.x/site-packages/pandas/io/sql.py", line 1401, in execute
cur.execute(*args)
pandas.io.sql.DatabaseError: Execution failed on sql 'select * from hogehoge limit 100;': ('42601', '[42601] [Amazon][Amazon Redshift] (30) Error occurred while trying to execute a query: [SQLState 42601] ERROR: syntax error at or near "���������������������������������"\nLINE 1: ���������������������������������;\n ^\n (30) (SQLExecDirectW)')
-> encoding を直す
>> cnxn.setencoding(encoding='utf-8')
戻り値の文字列が変
>> test_df.iloc[0,0]
Out[64]:
'1\x00'
-> decoding も修正する
## Python 3.x
cnxn.setdecoding(pyodbc.SQL_CHAR, encoding='utf-8')
cnxn.setdecoding(pyodbc.SQL_WCHAR, encoding='utf-8')
cnxn.setdecoding(pyodbc.SQL_WMETADATA, encoding='utf-32le')