Help us understand the problem. What is going on with this article?

daru チートシート

More than 3 years have passed since last update.

このチートシートは
datacamp.com 作成の Python For Data Science Cheat Sheet を Ruby で模したものです.
模したというものの完全に真似ることはできていません.また適宜変更も行っています.

daru の読み込みは下記のように行います.

[1] pry(main)> require "daru"

Install the reportbuilder gem version ~>1.4 for using reportbuilder functions.

Install the spreadsheet gem version ~>1.1.1 for using spreadsheet functions.
=> true

daru のデータ構造

Vector

VectorはpandasのSeriesに相当するものです.
1次元のlabeled arrayです.

[2] pry(main)> s = Daru::Vector.new([3, -5, 7, 4], index: [:a, :b, :c, :d])
=> #<Daru::Vector(4)>
   a   3
   b  -5
   c   7
   d   4

DataFrame

2次元のlabeled データ構造です.

[3] pry(main)> data = {'Country': ['Belgium', 'India', 'Brazil'], 'Capital': ['Brussels', 'New Delhi', 'Brasília'], 'Population': [11190846, 1303171035, 207847528]}
=> {:Country=>["Belgium", "India", "Brazil"], :Capital=>["Brussels", "New Delhi", "Brasília"], :Population=>[11190846, 1303171035, 207847528]}
[4] pry(main)> df = Daru::DataFrame.new(data)
=> #<Daru::DataFrame(3x3)>
               Capital    Country Population
          0   Brussels    Belgium   11190846
          1  New Delhi      India 1303171035
          2   Brasília     Brazil  207847528

選択

取得

一要素の取得

[5] pry(main)> s[:b]
=> -5

データフレームのサブセットの取得

[6] pry(main)> df.row[1..2]
=> #<Daru::DataFrame(2x3)>
               Capital    Country Population
          1  New Delhi      India 1303171035
          2   Brasília     Brazil  207847528

選択、ブーリアンインデックス そして 代入

ポジションによるもの (行と列のindexの順が通例とは逆であることに注意)

[7] pry(main)> df[1][0]
=> "Belgium"

ラベルによるもの

[8] pry(main)> df[:Capital][0]
=> "Brussels"

ブーリアンインデックス

[9] pry(main)> s.where(s.gt(1))
=> #<Daru::Vector(3)>
   a   3
   c   7
   d   4
[10] pry(main)> s.where(s.lt(-1) | s.gt(2))
=> #<Daru::Vector(4)>
   a   3
   b  -5
   c   7
   d   4
[11] pry(main)> df.where(df[:Population].gt(1200000000))
=> #<Daru::DataFrame(1x3)>
               Capital    Country Population
          1  New Delhi      India 1303171035

代入

[12] pry(main)> s[:a]=6
=> 6
[13] pry(main)> s
=> #<Daru::Vector(4)>
   a   6
   b  -5
   c   7
   d   4

ソートとランク

[14] pry(main)> df.sort([:Country])
=> #<Daru::DataFrame(3x3)>
               Capital    Country Population
          0   Brussels    Belgium   11190846
          2   Brasília     Brazil  207847528
          1  New Delhi      India 1303171035

VectorとDataFrameの情報の取得

基本情報

[15] pry(main)> df.shape
=> [3, 3]
[16] pry(main)> df.index
=> #<Daru::Index(3): {0, 1, 2}>
[17] pry(main)> df.vectors
=> #<Daru::Index(3): {Capital, Country, Population}>
[18] pry(main)> df.count
=> #<Daru::Vector(1)>
                 count
 Population          3

サマリー

[19] pry(main)> df.sum
=> #<Daru::Vector(1)>
                   sum
 Population 1522209409
[20] pry(main)> df.cumsum
=> #<Daru::DataFrame(3x1)>
            Population
          0   11190846
          1 1314361881
          2 1522209409
[21] pry(main)> df.min
=> #<Daru::Vector(1)>
                   min
 Population   11190846
[22] pry(main)> df.max
=> #<Daru::Vector(1)>
                   max
 Population 1303171035
[23] pry(main)> df.describe
=> #<Daru::DataFrame(5x1)>
            Population
      count          3
       mean 507403136.
        std 696134594.
        min   11190846
        max 1303171035
[24] pry(main)> df.mean
=> #<Daru::Vector(1)>
                                mean
        Population 507403136.3333333
[25] pry(main)> df.median
=> #<Daru::Vector(1)>
                median
 Population  207847528

I/O

CSVの読み書き

事前に csv file を入手します.

wget https://raw.githubusercontent.com/fivethirtyeight/data/master/airline-safety/airline-safety.csv`
[26] pry(main)> df = Daru::DataFrame.from_csv("airline-safety.csv")
=> #<Daru::DataFrame(56x8)>
               airline avail_seat incidents_ fatal_acci fatalities incidents_ fatal_acci fatalities
          0 Aer Lingus  320906734          2          0          0          0          0          0
          1  Aeroflot* 1197672318         76         14        128          6          1         88
          2 Aerolineas  385803648          6          0          0          1          0          0
          3 Aeromexico  596871813          3          1         64          5          0          0
          4 Air Canada 1865253802          2          0          0          2          0          0
          5 Air France 3004002661         14          4         79          6          2        337
          6 Air India*  869253552          2          1        329          4          1        158
          7 Air New Ze  710174817          3          0          0          5          1          7
          8 Alaska Air  965346773          5          0          0          5          1         88
          9   Alitalia  698012498          7          2         50          4          0          0
         10 All Nippon 1841234177          3          1          1          7          0          0
         11  American* 5228357340         21          5        101         17          3        416
         12 Austrian A  358239823          1          0          0          1          0          0
         13    Avianca  396922563          5          3        323          0          0          0
         14 British Ai 3179760952          4          0          0          6          0          0
        ...        ...        ...        ...        ...        ...        ...        ...        ...
[27] pry(main)> df.write_csv("aaa.csv")
=> nil
[28] pry(main)> .head aaa.csv
airline,avail_seat_km_per_week,incidents_85_99,fatal_accidents_85_99,fatalities_85_99,incidents_00_14,fatal_accidents_00_14,fatalities_00_14
Aer Lingus,320906734,2,0,0,0,0,0
Aeroflot*,1197672318,76,14,128,6,1,88
Aerolineas Argentinas,385803648,6,0,0,1,0,0
Aeromexico*,596871813,3,1,64,5,0,0
Air Canada,1865253802,2,0,0,2,0,0
Air France,3004002661,14,4,79,6,2,337
Air India*,869253552,2,1,329,4,1,158
Air New Zealand*,710174817,3,0,0,5,1,7
Alaska Airlines*,965346773,5,0,0,5,1,88
Why not register and get more from Qiita?
  1. We will deliver articles that match you
    By following users and tags, you can catch up information on technical fields that you are interested in as a whole
  2. you can read useful information later efficiently
    By "stocking" the articles you like, you can search right away
Comments
Sign up for free and join this conversation.
If you already have a Qiita account
Why do not you register as a user and use Qiita more conveniently?
You need to log in to use this function. Qiita can be used more conveniently after logging in.
You seem to be reading articles frequently this month. Qiita can be used more conveniently after logging in.
  1. We will deliver articles that match you
    By following users and tags, you can catch up information on technical fields that you are interested in as a whole
  2. you can read useful information later efficiently
    By "stocking" the articles you like, you can search right away