1
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 1 year has passed since last update.

【Python】Rust版Pandas 'Polars'備忘録

Last updated at Posted at 2023-04-28

Polarsとは

Rustで構成されたPandasを強く意識したPythonライブラリ。
Pandasより実行スピードが早いのが特徴。

API

Input/output

SeriesのMethod

DataFrameのMethod

LazyFrameのMethod

Expressions(評価式)

Function(関数)

SQL

実際のコード

import os
import sys
import time
import glob
import re
import datetime
import csv
import numpy as np
import polars as pl

# 現在時間取得
now = datetime.datetime.now()
today = now.strftime('%Y-%m-%d %H:%M')

# 引数取得
args = sys.argv
print(os.path.abspath(args[1]))
input_path = os.path.abspath(args[1])
basename = os.path.basename(input_path)
basename_wo_ext = os.path.splitext(os.path.basename(input_path))[0]
dirname = os.path.dirname(input_path)

# DataFrame生成1
rng = np.random.default_rng(0)
df = pl.DataFrame(rng.random((2, 3)), schema=["A", "B", "C"])

# DataFrame生成2
df = pl.DataFrame(
    {
        "Integer": [1, 2, 3, 4],
        "Float": np.array([1, 2, 3, 4], dtype=float),
        "Datetime": [datetime.datetime(2022, 4, 1)] * 4,
        "String": ["test", "train", "test", "train"],
    }
)

# ファイル読み込み
files_file = input_path
df = pl.read_csv(files_file, has_header=True, skip_rows=0, columns=[0, 1, 2, 3, 4, 5], dtypes={'ID': str, 'X': float}, separator=',', encoding='utf8')
# サイズ確認
print(df.shape)
print(df.height)
print(df.width)
# NullとNanを置換
df = df.fill_null(0)
df = df.fill_nan(0)
# 行抽出
print(df[0:3])
# 列抽出
print(df.get_column("ID"))
# すべての列をリストで取得
print(df.get_columns())
# 左から何番目の列か
print(df.find_idx_by_name("ID"))
# 列取得
print(df.select("ID"))
# 新しい列を追加
new_seires = (df.get_column("xcenter") * 2).alias("xcenter2")
df = df.with_columns(new_seires)
# 指定した列でソート
df = df.select(pl.col(df.columns).sort_by("ID"))
# 中身確認
print(df.glimpse())
print(df.describe())
print(df.columns)
# 特定の列に特定の文字が存在するか確認
print(df.get_column("ID").is_in(['ea02e0be-b90b-48a6-a90']))
# reset index
df = df.with_row_count()
# フィルター
df = df.filter(((2000 >= pl.col("xcenter")) & (pl.col("xcenter") > 1160)) | ((1000 >= pl.col("ycenter")) & (pl.col("ycenter") > 750)))

# カウント数でフィルター
df_counts = df["ID"].value_counts()
df_counts = df_counts.filter((35 < pl.col("counts")) & (250 > pl.col("counts")))
fill_list = df_counts.get_column("ID")
df = df.filter(df.get_column("ID").is_in(fill_list))

# カウント分布を保存
df_counts = df["ID"].value_counts()
df_counts = df_counts.select(pl.col(df_counts.columns).sort_by("counts"))
df_counts.write_csv("./counts_by_ID.csv")

# 重複を削除
df = df.unique(subset=["ID"], keep="first")

print(df)
Number_of_employees = df["ID"].value_counts().sum().get_column("counts")[0]
print(Number_of_employees)
df.write_csv("./filtered.csv")

まとめ

今回は、Polarsについて紹介した。

1
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
1
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?