LoginSignup
11
13

More than 5 years have passed since last update.

JuliaからRのデータセットを使う

Last updated at Posted at 2016-07-18

R言語の利点のひとつは、統計処理や種々の分析を試してみるための組み込みデータセットが豊富なところです。他言語からそれらを呼び出す方法を知っていれば、Rユーザーが他の言語を試す際にも便利ですよね。というわけで今回はJuliaからRのデータセットを使う方法を試してみました。

インストールしてhead()してみる

https://github.com/johnmyleswhite/RDatasets.jl
RDatasetsパッケージを使います。

$julia
julia> Pkg.add("RDatasets") # いろいろgit cloneされる
julia> using RDatasets # いろいろprecompileされる
julia> head(RDatasets.datasets("datasets"))
6×5 DataFrames.DataFrame
 Row  Package     Dataset             Title                                        Rows  Columns 
├─────┼────────────┼────────────────────┼─────────────────────────────────────────────┼──────┼─────────┤
 1    "datasets"  "BOD"               "Biochemical Oxygen Demand"                  6     2       
 2    "datasets"  "CO2"               "Carbon Dioxide Uptake in Grass Plants"      84    5       
 3    "datasets"  "Formaldehyde"      "Determination of Formaldehyde"              6     2       
 4    "datasets"  "HairEyeColor"      "Hair and Eye Color of Statistics Students"  32    4       
 5    "datasets"  "InsectSprays"      "Effectiveness of Insect Sprays"             72    2       
 6    "datasets"  "LifeCycleSavings"  "Intercountry Life-Cycle Savings Data"       50    6       

anscombeデータを呼びだしてみる

ちょっと紛らわしいのですが、Rの標準組み込みデータセットは、現在{datasets}という名前のパッケージに含まれています。なので、例えば{datasets}パッケージに含まれているanscombeというデータを呼び出したかったら、以下のように書いて下さい。

なおanscombeデータセットの中身についてはアンスコムの例 - Wikipediaを参照して下さい。

julia> anscombe = RDatasets.dataset("datasets","anscombe")
11×8 DataFrames.DataFrame
 Row  X1  X2  X3  X4  Y1     Y2    Y3     Y4   
├─────┼────┼────┼────┼────┼───────┼──────┼───────┼──────┤
 1    10  10  10  8   8.04   9.14  7.46   6.58 
 2    8   8   8   8   6.95   8.14  6.77   5.76 
 3    13  13  13  8   7.58   8.74  12.74  7.71 
 4    9   9   9   8   8.81   8.77  7.11   8.84 
 5    11  11  11  8   8.33   9.26  7.81   8.47 
 6    14  14  14  8   9.96   8.1   8.84   7.04 
 7    6   6   6   8   7.24   6.13  6.08   5.25 
 8    4   4   4   19  4.26   3.1   5.39   12.5 
 9    12  12  12  8   10.84  9.13  8.15   5.56 
 10   7   7   7   8   4.82   7.26  6.42   7.91 
 11   5   5   5   8   5.68   4.74  5.73   6.89 

RDatasetsパッケージが入っていれば、RDatasets.dataset()は、単にdataset()とも書けます。

{datasets}パッケージ以外のデータもすぐ使える

julia> RDatasets.datasets()
733×5 DataFrames.DataFrame
 Row  Package  Dataset            Title                                             Rows   Columns 
├─────┼─────────┼───────────────────┼──────────────────────────────────────────────────┼───────┼─────────┤
 1    "COUNT"  "affairs"          "affairs"                                         601    18      
 2    "COUNT"  "azdrg112"         "azdrg112"                                        1798   4       
 3    "COUNT"  "azpro"            "azpro"                                           3589   6       
 4    "COUNT"  "badhealth"        "badhealth"                                       1127   3       
 5    "COUNT"  "fasttrakg"        "fasttrakg"                                       15     9       
 6    "COUNT"  "lbw"              "lbw"                                             189    10      
 7    "COUNT"  "lbwgrp"           "lbwgrp"                                          6      7       
 8    "COUNT"  "loomis"           "loomis"                                          410    11      
 9    "COUNT"  "mdvis"            "mdvis"                                           2227   13      
 10   "COUNT"  "medpar"           "medpar"                                          1495   10      
 11   "COUNT"  "rwm"              "rwm"                                             27326  4       
 12   "COUNT"  "rwm5yr"           "rwm5yr"                                          19609  17      
 13   "COUNT"  "ships"            "ships"                                           40     7       
 14   "COUNT"  "titanic"          "titanic"                                         1316   4       
 15   "COUNT"  "titanicgrp"       "titanicgrp"                                      12     5       
 16   "Ecdat"  "Accident"         "Ship Accidents"                                  40     5       
 17   "Ecdat"  "Airline"          "Cost for U.S. Airlines"                          90     6       
 18   "Ecdat"  "Airq"             "Air Quality for Californian Metropolitan Areas"  30     6       

 715  "vcd"    "Hospital"         "Hospital data"                                   3      4       
 716  "vcd"    "JobSatisfaction"  "Job Satisfaction Data"                           8      4       
 717  "vcd"    "JointSports"      "Opinions About Joint Sports"                     40     5       
 718  "vcd"    "Lifeboats"        "Lifeboats on the Titanic"                        18     8       
 719  "vcd"    "NonResponse"      "Non-Response Survey Data"                        12     4       
 720  "vcd"    "OvaryCancer"      "Ovary Cancer Data"                               16     5       
 721  "vcd"    "PreSex"           "Pre-marital Sex and Divorce"                     16     5       
 722  "vcd"    "Punishment"       "Corporal Punishment Data"                        36     5       
 723  "vcd"    "RepVict"          "Repeat Victimization Data"                       8      9       
 724  "vcd"    "Saxony"           "Families in Saxony"                              13     2       
 725  "vcd"    "SexualFun"        "Sex is Fun"                                      4      5       
 726  "vcd"    "SpaceShuttle"     "Space Shuttle O-ring Failures"                   24     6       
 727  "vcd"    "Suicide"          "Suicide Rates in Germany"                        306    6       
 728  "vcd"    "Trucks"           "Truck Accidents Data"                            24     5       
 729  "vcd"    "UKSoccer"         "UK Soccer Scores"                                5      6       
 730  "vcd"    "VisualAcuity"     "Visual Acuity in Left and Right Eyes"            32     4       
 731  "vcd"    "VonBort"          "Von Bortkiewicz Horse Kicks Data"                280    4       
 732  "vcd"    "WeldonDice"       "Weldon's Dice Data"                              11     2       
 733  "vcd"    "WomenQueue"       "Women in Queues"                                 11     2       

インストールとコンパイルのログ省略しないver

$ julia
               _
   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: http://docs.julialang.org
   _ _   _| |_  __ _   |  Type "?help" for help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.4.5 (2016-03-18 00:58 UTC)
 _/ |\__'_|_|_|\__'_|  |  
|__/                   |  x86_64-apple-darwin14.5.0
julia> Pkg.add("RDatasets")
INFO: Initializing package repository /Users/takao/.julia/v0.4
INFO: Cloning METADATA from git://github.com/JuliaLang/METADATA.jl
INFO: Cloning cache of BinDeps from git://github.com/JuliaLang/BinDeps.jl.git
INFO: Cloning cache of Compat from git://github.com/JuliaLang/Compat.jl.git
INFO: Cloning cache of DataArrays from git://github.com/JuliaStats/DataArrays.jl.git
INFO: Cloning cache of DataFrames from git://github.com/JuliaStats/DataFrames.jl.git
INFO: Cloning cache of Docile from git://github.com/MichaelHatherly/Docile.jl.git
INFO: Cloning cache of GZip from git://github.com/JuliaIO/GZip.jl.git
INFO: Cloning cache of RDatasets from git://github.com/johnmyleswhite/RDatasets.jl.git
INFO: Cloning cache of Reexport from git://github.com/simonster/Reexport.jl.git
INFO: Cloning cache of Rmath from git://github.com/JuliaStats/Rmath.jl.git
INFO: Cloning cache of SHA from git://github.com/staticfloat/SHA.jl.git
INFO: Cloning cache of SortingAlgorithms from git://github.com/JuliaLang/SortingAlgorithms.jl.git
INFO: Cloning cache of StatsBase from git://github.com/JuliaStats/StatsBase.jl.git
INFO: Cloning cache of StatsFuns from git://github.com/JuliaStats/StatsFuns.jl.git
INFO: Cloning cache of URIParser from git://github.com/JuliaWeb/URIParser.jl.git
INFO: Installing BinDeps v0.3.21
INFO: Installing Compat v0.8.5
INFO: Installing DataArrays v0.3.6
INFO: Installing DataFrames v0.7.5
INFO: Installing Docile v0.5.23
INFO: Installing GZip v0.2.19
INFO: Installing RDatasets v0.1.3
INFO: Installing Reexport v0.0.3
INFO: Installing Rmath v0.1.1
INFO: Installing SHA v0.1.2
INFO: Installing SortingAlgorithms v0.0.6
INFO: Installing StatsBase v0.9.0
INFO: Installing StatsFuns v0.3.0
INFO: Installing URIParser v0.1.5
INFO: Building Rmath
INFO: Package database updated

julia> 
julia> using RDatasets
INFO: Precompiling module Reexport...
INFO: Precompiling module DataFrames...

実行環境

julia> versioninfo()
Julia Version 0.4.5
Commit 2ac304d* (2016-03-18 00:58 UTC)
Platform Info:
  System: Darwin (x86_64-apple-darwin14.5.0)
  CPU: Intel(R) Core(TM) i5-3210M CPU @ 2.50GHz
  WORD_SIZE: 64
  BLAS: libopenblas (DYNAMIC_ARCH NO_AFFINITY Sandybridge)
  LAPACK: libopenblas
  LIBM: libopenlibm
  LLVM: libLLVM-3.3
julia> now()
2016-07-18T21:17:52

参考資料

Enjoy!

11
13
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
11
13