More than 1 year has passed since last update.

(備忘録)python機械学習で参考になったコード

Last updated at 2022-12-03Posted at 2022-11-12

リストの要素をソートした新しいリストを作成する

1. sort()
リストの要素がインプレースで並べ替えられる

2. sorted
戻り値において、要素が並べ替えられる

リストの要素をソートした新しいリストを作成する

##  リストの要素をソートした新しいリストを作成する
mylist = [18, 0, 16, 6, 15, 7, 9, 1, 2, 5]
newlist = sorted(mylist)
newlist2 = sorted(mylist, reverse=True)
print('mylist:', mylist)  # mylist: [18, 0, 16, 6, 15, 7, 9, 1, 2, 5]
print('newlist:', newlist)  # newlist: [0, 1, 2, 5, 6, 7, 9, 15, 16, 18]
print('newlist2:', newlist2) # newlist2: [18, 16, 15, 9, 7, 6, 5, 2, 1, 0]

dataframe.addの挙動

A= pd.DataFrame(np.array([[1,2,3],[4,5,6],[7,8,np.nan]]))
A.add(1) 
👇
0	1	2
0	2.0	3.0	4.0
1	5.0	6.0	7.0
2	8.0	9.0	NaN

A.add(1, fill_value=1)
👇
0	1	2
0	2.0	3.0	4.0
1	5.0	6.0	7.0
2	8.0	9.0	2.0

XX乗の配列を作成

# 2^1 ～ 2^10の配列を作成
x = np.logspace(1, 10, 10, base=2)

print(x) #[ 2.  4.  8.  16.  32.  64.  128.  256.  512. 1024.]

参考: https://algorithm.joho.info/programming/python/numpy-logspace/

cudfとは？

◯ 特徴

NVIDIAが開発を行っている、GPUを使用して高速に処理が可能な"pandasみたいなもの"

リストを逆から取り出す

例) リストを逆から取り出したい

s = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
a = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

a[::-1] #[10, 9, 8, 7, 6, 5, 4, 3, 2, 1]

参考: https://www.delftstack.com/ja/howto/python/negative-index-python/

共起性を示したい時

例) AとBが同時に発生する回数について、辞書形式が保存する

from collections import defaultdict, Counter
next_AIDs = defaultdict(Counter)
chunk_size = 30_000
sessions = subset_of_train.session.unique()
(中略)
sessions = subset_of_test.session.unique()
for i in range(0, sessions.shape[0], chunk_size):
    consecutive_AIDs = subset_of_test.loc[sessions[i]:sessions[min(sessions.shape[0]-1, i+chunk_size-1)]].reset_index(drop=True)
    consecutive_AIDs = consecutive_AIDs.groupby('session').apply(lambda g: g.tail(30)).reset_index(drop=True)
    consecutive_AIDs = consecutive_AIDs.merge(consecutive_AIDs, on='session')
    consecutive_AIDs = consecutive_AIDs[consecutive_AIDs.aid_x != consecutive_AIDs.aid_y]
    consecutive_AIDs['days_elapsed'] = (consecutive_AIDs.ts_y - consecutive_AIDs.ts_x) / (24 * 60 * 60)
    consecutive_AIDs = consecutive_AIDs[(consecutive_AIDs.days_elapsed > 0) & (consecutive_AIDs.days_elapsed <= 1)]
    for row in consecutive_AIDs.drop_duplicates(['session', 'aid_x', 'aid_y']).itertuples():
        next_AIDs[row.aid_x][row.aid_y] += 1

参考: https://www.kaggle.com/code/deepkun1995/co-visitation-matrix-simplified-imprvd-logic/edit

counterの作成

例) 回数などのカウントをしたい

words = ["dog", "cat", "dog", "dog", "mouse", "cat"]

# dictの場合
counter = {}
for w in words:
    if w in counter:
        counter[w] += 1
    else:
        counter[w] = 1

# defaultdictの場合
from collections import defaultdict

counter = defaultdict(lambda: 0)
for w in words:
    counter[w] += 1

参考: https://ohke.hateblo.jp/entry/2020/04/04/230000

リストの重複を減らす

例) リストの中から、重複している要素を減らす

list_test1 = [5, 10, 5, 10, 2, 1, 3, 4, 6]
list_test2 = ['田中', '大杉', '田中', '本宮', '岡田', '岡本', '小林', '村尾']
 
list_test1 = list(dict.fromkeys(list_test1))
print(list_test1)  # [5, 10, 2, 1, 3, 4, 6]


list_test2 = list(dict.fromkeys(list_test2))
print(list_test2)  # ['田中', '大杉', '本宮', '岡田', '岡本', '小林', '村尾']

参考: https://laboratory.kazuuu.net/using-the-python-dictionary-fromkeys-function-to-remove-duplicates/

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up