1

More than 5 years have passed since last update.

pandas・numpy・python

pandas

Last updated at 2016-05-27Posted at 2016-02-13

boolの数を数える

import numpy as np
np.count_nonzero({list|DataFrame|Series})
np.sum({DataFrame}) #× 行合計

文字列要素の部分一致

df[xxx].str.contains("部分文字列")

重複判定

df[xxx].duplicated()

データに関数を使う

sr.apply(func)

重複したindexを削除

重複しているグループの最後のデータを残す場合

dup_index = df.index.duplicated(take_last=True)
df[dup_index]

※take_lastではFutureWarningがでる、keep='last'

python

ソート

list.sort(key=func,reverse=True|False)

funcにはソートに使う値を返す関数を
list.sort(key=lambda a:a[1])

可変長引数

def func(*args,**kwards)
def func(var=1,var=2,*args,**kwards)
~~ child_func(*args,**kwards)

※順番に注意

BeautifulSoupとlxml

BeautifulSoup

from bs4 import BeautifulSoup as bs
soup = bs(src)

# soup.find("タグ",{"要素":"文字列"})
# 例
soup.find("div",{"class":"文字列"})

lxml

from lxml import html
html.fromstring(src)

# 検証ツールからxpathコピーとか
# 見つかったのを配列で返してくるので
XPATH = "//*[@id=\"rfindex\"]/div[2]/div[1]/dl/dd/strong"
dom.xpath(XPATH)[0]

lxmlの方がソース元のせいでパースできない事が多い。
ページ遷移しながら使おうとする場合はlxmlよりbsの方が安定する。

1

Register as a new user and use Qiita more conveniently

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

1