このチートシートは
datacamp.com 作成の Python For Data Science Cheat Sheet を Ruby で模したものです.
模したというものの完全に真似ることはできていません.また適宜変更も行っています.
daru の読み込みは下記のように行います.
[1] pry(main)> require "daru"
Install the reportbuilder gem version ~>1.4 for using reportbuilder functions.
Install the spreadsheet gem version ~>1.1.1 for using spreadsheet functions.
=> true
daru のデータ構造
Vector
VectorはpandasのSeriesに相当するものです.
1次元のlabeled arrayです.
[2] pry(main)> s = Daru::Vector.new([3, -5, 7, 4], index: [:a, :b, :c, :d])
=> #<Daru::Vector(4)>
a 3
b -5
c 7
d 4
DataFrame
2次元のlabeled データ構造です.
[3] pry(main)> data = {'Country': ['Belgium', 'India', 'Brazil'], 'Capital': ['Brussels', 'New Delhi', 'Brasília'], 'Population': [11190846, 1303171035, 207847528]}
=> {:Country=>["Belgium", "India", "Brazil"], :Capital=>["Brussels", "New Delhi", "Brasília"], :Population=>[11190846, 1303171035, 207847528]}
[4] pry(main)> df = Daru::DataFrame.new(data)
=> #<Daru::DataFrame(3x3)>
Capital Country Population
0 Brussels Belgium 11190846
1 New Delhi India 1303171035
2 Brasília Brazil 207847528
選択
取得
一要素の取得
[5] pry(main)> s[:b]
=> -5
データフレームのサブセットの取得
[6] pry(main)> df.row[1..2]
=> #<Daru::DataFrame(2x3)>
Capital Country Population
1 New Delhi India 1303171035
2 Brasília Brazil 207847528
選択、ブーリアンインデックス そして 代入
ポジションによるもの (行と列のindexの順が通例とは逆であることに注意)
[7] pry(main)> df[1][0]
=> "Belgium"
ラベルによるもの
[8] pry(main)> df[:Capital][0]
=> "Brussels"
ブーリアンインデックス
[9] pry(main)> s.where(s.gt(1))
=> #<Daru::Vector(3)>
a 3
c 7
d 4
[10] pry(main)> s.where(s.lt(-1) | s.gt(2))
=> #<Daru::Vector(4)>
a 3
b -5
c 7
d 4
[11] pry(main)> df.where(df[:Population].gt(1200000000))
=> #<Daru::DataFrame(1x3)>
Capital Country Population
1 New Delhi India 1303171035
代入
[12] pry(main)> s[:a]=6
=> 6
[13] pry(main)> s
=> #<Daru::Vector(4)>
a 6
b -5
c 7
d 4
ソートとランク
[14] pry(main)> df.sort([:Country])
=> #<Daru::DataFrame(3x3)>
Capital Country Population
0 Brussels Belgium 11190846
2 Brasília Brazil 207847528
1 New Delhi India 1303171035
VectorとDataFrameの情報の取得
基本情報
[15] pry(main)> df.shape
=> [3, 3]
[16] pry(main)> df.index
=> #<Daru::Index(3): {0, 1, 2}>
[17] pry(main)> df.vectors
=> #<Daru::Index(3): {Capital, Country, Population}>
[18] pry(main)> df.count
=> #<Daru::Vector(1)>
count
Population 3
サマリー
[19] pry(main)> df.sum
=> #<Daru::Vector(1)>
sum
Population 1522209409
[20] pry(main)> df.cumsum
=> #<Daru::DataFrame(3x1)>
Population
0 11190846
1 1314361881
2 1522209409
[21] pry(main)> df.min
=> #<Daru::Vector(1)>
min
Population 11190846
[22] pry(main)> df.max
=> #<Daru::Vector(1)>
max
Population 1303171035
[23] pry(main)> df.describe
=> #<Daru::DataFrame(5x1)>
Population
count 3
mean 507403136.
std 696134594.
min 11190846
max 1303171035
[24] pry(main)> df.mean
=> #<Daru::Vector(1)>
mean
Population 507403136.3333333
[25] pry(main)> df.median
=> #<Daru::Vector(1)>
median
Population 207847528
I/O
CSVの読み書き
事前に csv file を入手します.
wget https://raw.githubusercontent.com/fivethirtyeight/data/master/airline-safety/airline-safety.csv`
[26] pry(main)> df = Daru::DataFrame.from_csv("airline-safety.csv")
=> #<Daru::DataFrame(56x8)>
airline avail_seat incidents_ fatal_acci fatalities incidents_ fatal_acci fatalities
0 Aer Lingus 320906734 2 0 0 0 0 0
1 Aeroflot* 1197672318 76 14 128 6 1 88
2 Aerolineas 385803648 6 0 0 1 0 0
3 Aeromexico 596871813 3 1 64 5 0 0
4 Air Canada 1865253802 2 0 0 2 0 0
5 Air France 3004002661 14 4 79 6 2 337
6 Air India* 869253552 2 1 329 4 1 158
7 Air New Ze 710174817 3 0 0 5 1 7
8 Alaska Air 965346773 5 0 0 5 1 88
9 Alitalia 698012498 7 2 50 4 0 0
10 All Nippon 1841234177 3 1 1 7 0 0
11 American* 5228357340 21 5 101 17 3 416
12 Austrian A 358239823 1 0 0 1 0 0
13 Avianca 396922563 5 3 323 0 0 0
14 British Ai 3179760952 4 0 0 6 0 0
... ... ... ... ... ... ... ... ...
[27] pry(main)> df.write_csv("aaa.csv")
=> nil
[28] pry(main)> .head aaa.csv
airline,avail_seat_km_per_week,incidents_85_99,fatal_accidents_85_99,fatalities_85_99,incidents_00_14,fatal_accidents_00_14,fatalities_00_14
Aer Lingus,320906734,2,0,0,0,0,0
Aeroflot*,1197672318,76,14,128,6,1,88
Aerolineas Argentinas,385803648,6,0,0,1,0,0
Aeromexico*,596871813,3,1,64,5,0,0
Air Canada,1865253802,2,0,0,2,0,0
Air France,3004002661,14,4,79,6,2,337
Air India*,869253552,2,1,329,4,1,158
Air New Zealand*,710174817,3,0,0,5,1,7
Alaska Airlines*,965346773,5,0,0,5,1,88