PythonのPandasのDataFrameでデータを柔軟に扱いたかったので、個別に扱う方法を調べた。
csvデータ
ソースコード
animedata.py
import sys
import pandas as pd
args = sys.argv
file_name = args[1]
print(file_name)
print(type(file_name))
anime_datalist = pd.read_csv(file_name)
print(anime_datalist.head())
print(anime_datalist.columns)
row_count = 0
all_row_num = len(anime_datalist)
print(all_row_num)
print(anime_datalist['Title'][1])
while row_count < 50:
print(anime_datalist['Title'][row_count])
row_count = row_count + 1
まず、データを
anime_datalist.head()
で先頭から5行までを表示し、行名と列名を確認。
このデータだと行は数値で表せる事がわかる。
Title Type Episodes Status ... Scored by Members Favorites Description
0 Fullmetal Alchemist: Brotherhood TV 64 Finished Airing ... 719706 1176368 105387 "In order for something to be obtained, someth...
1 Kimi no Na wa. Movie 1 Finished Airing ... 454969 705186 33936 Mitsuha Miyamizu, a high school girl, yearns t...
2 Gintama° TV 51 Finished Airing ... 70279 194359 5597 Gintoki, Shinpachi, and Kagura return as the f...
3 Steins;Gate 0 TV 23 Currently Airing ... 12609 186331 1117 The dark untold story of Steins;Gate that lead...
4 Steins;Gate TV 24 Finished Airing ... 552791 990419 90365 The self-proclaimed mad scientist Rintarou Oka...
列名を詳しく知りたいので、
anime_datalist.columns
で確認。
Index(['Title', 'Type', 'Episodes', 'Status', 'Start airing', 'End airing',
'Starting season', 'Broadcast time', 'Producers', 'Licensors',
'Studios', 'Sources', 'Genres', 'Duration', 'Rating', 'Score',
'Scored by', 'Members', 'Favorites', 'Description'],
dtype='object')
ここまで調べれば、そのまま2次元配列として扱えるので、
print(anime_datalist['Title'][0])
と言う直接指定や
while row_count < 50:
print(anime_datalist['Title'][row_count])
row_count = row_count + 1
などのように個別に扱う事ができる。