Help us understand the problem. What is going on with this article?

python+pandasで大規模データを扱うときのメモ

More than 5 years have passed since last update.

MySQLからデータを取り出す

"""Get data from MySQL with pandas library."""
import MySQLdb
import pandas.io.sql as psql

con = MySQLdb.connect(db='work', user='root', passwd='') # DB接続
sql = """SELECT product_id, product_nm, product_features FROM electronics"""
df = psql.read_sql(sql, con) # pandasのDataFrameの形でデータを取り出す
con.close()

データからベクトルを作る1

大規模データを使ってクラスタリングなどのためのベクトルを作る際、メモリ消費を抑えるためにデータを削除しながら繰り返し処理を行う。

"""Delete rows while creating dataset."""
X = []
for index, row in df.iterrows(): # 行ごとに繰り返し処理
    Xi = [row.col1, row.col2, row.col3]
    X.append(X)
    df = df.ix[index:] # メモリの消費を抑えるためにデータを削除しながらベクトルを作成

データからベクトルを作る2(速度改良)

1つめのやり方のではコードはきれいだが、繰り返しの速度が遅いという難点がある。
一度リストにするやり方の方が何倍も速い。

"""High speed row iteration in pandas DataFrame"""
# データをコピーしてリストへ
df_index, df_col1, df_col2, df_col3 = \
    list(df.index), list(df.col1), list(df.col2), list(df.col3)
del df # データを削除
for _ in df_index:
    # データを削除しながらイテレート
    col1, col2, col3 = df_col1.pop(), df_col2.pop(), df_col3.pop()
    Xi = [col1, col2, col3]
    X.append(Xi)
Why do not you register as a user and use Qiita more conveniently?
  1. We will deliver articles that match you
    By following users and tags, you can catch up information on technical fields that you are interested in as a whole
  2. you can read useful information later efficiently
    By "stocking" the articles you like, you can search right away
Comments
Sign up for free and join this conversation.
If you already have a Qiita account
Why do not you register as a user and use Qiita more conveniently?
You need to log in to use this function. Qiita can be used more conveniently after logging in.
You seem to be reading articles frequently this month. Qiita can be used more conveniently after logging in.
  1. We will deliver articles that match you
    By following users and tags, you can catch up information on technical fields that you are interested in as a whole
  2. you can read useful information later efficiently
    By "stocking" the articles you like, you can search right away