R
Julia

JuliaからRのデータセットを使う

More than 1 year has passed since last update.

R言語の利点のひとつは、統計処理や種々の分析を試してみるための組み込みデータセットが豊富なところです。他言語からそれらを呼び出す方法を知っていれば、Rユーザーが他の言語を試す際にも便利ですよね。というわけで今回はJuliaからRのデータセットを使う方法を試してみました。


インストールしてhead()してみる

https://github.com/johnmyleswhite/RDatasets.jl

RDatasetsパッケージを使います。

$julia

julia> Pkg.add("RDatasets") # いろいろgit cloneされる
julia> using RDatasets # いろいろprecompileされる
julia> head(RDatasets.datasets("datasets"))
6×5 DataFrames.DataFrame
Row Package Dataset Title Rows Columns
├─────┼────────────┼────────────────────┼─────────────────────────────────────────────┼──────┼─────────┤
1 "datasets" "BOD" "Biochemical Oxygen Demand" 6 2
2 "datasets" "CO2" "Carbon Dioxide Uptake in Grass Plants" 84 5
3 "datasets" "Formaldehyde" "Determination of Formaldehyde" 6 2
4 "datasets" "HairEyeColor" "Hair and Eye Color of Statistics Students" 32 4
5 "datasets" "InsectSprays" "Effectiveness of Insect Sprays" 72 2
6 "datasets" "LifeCycleSavings" "Intercountry Life-Cycle Savings Data" 50 6


anscombeデータを呼びだしてみる

ちょっと紛らわしいのですが、Rの標準組み込みデータセットは、現在{datasets}という名前のパッケージに含まれています。なので、例えば{datasets}パッケージに含まれているanscombeというデータを呼び出したかったら、以下のように書いて下さい。

なおanscombeデータセットの中身についてはアンスコムの例 - Wikipediaを参照して下さい。

julia> anscombe = RDatasets.dataset("datasets","anscombe")

11×8 DataFrames.DataFrame
Row X1 X2 X3 X4 Y1 Y2 Y3 Y4
├─────┼────┼────┼────┼────┼───────┼──────┼───────┼──────┤
1 10 10 10 8 8.04 9.14 7.46 6.58
2 8 8 8 8 6.95 8.14 6.77 5.76
3 13 13 13 8 7.58 8.74 12.74 7.71
4 9 9 9 8 8.81 8.77 7.11 8.84
5 11 11 11 8 8.33 9.26 7.81 8.47
6 14 14 14 8 9.96 8.1 8.84 7.04
7 6 6 6 8 7.24 6.13 6.08 5.25
8 4 4 4 19 4.26 3.1 5.39 12.5
9 12 12 12 8 10.84 9.13 8.15 5.56
10 7 7 7 8 4.82 7.26 6.42 7.91
11 5 5 5 8 5.68 4.74 5.73 6.89

RDatasetsパッケージが入っていれば、RDatasets.dataset()は、単にdataset()とも書けます。


{datasets}パッケージ以外のデータもすぐ使える

julia> RDatasets.datasets()

733×5 DataFrames.DataFrame
Row Package Dataset Title Rows Columns
├─────┼─────────┼───────────────────┼──────────────────────────────────────────────────┼───────┼─────────┤
1 "COUNT" "affairs" "affairs" 601 18
2 "COUNT" "azdrg112" "azdrg112" 1798 4
3 "COUNT" "azpro" "azpro" 3589 6
4 "COUNT" "badhealth" "badhealth" 1127 3
5 "COUNT" "fasttrakg" "fasttrakg" 15 9
6 "COUNT" "lbw" "lbw" 189 10
7 "COUNT" "lbwgrp" "lbwgrp" 6 7
8 "COUNT" "loomis" "loomis" 410 11
9 "COUNT" "mdvis" "mdvis" 2227 13
10 "COUNT" "medpar" "medpar" 1495 10
11 "COUNT" "rwm" "rwm" 27326 4
12 "COUNT" "rwm5yr" "rwm5yr" 19609 17
13 "COUNT" "ships" "ships" 40 7
14 "COUNT" "titanic" "titanic" 1316 4
15 "COUNT" "titanicgrp" "titanicgrp" 12 5
16 "Ecdat" "Accident" "Ship Accidents" 40 5
17 "Ecdat" "Airline" "Cost for U.S. Airlines" 90 6
18 "Ecdat" "Airq" "Air Quality for Californian Metropolitan Areas" 30 6

715 "vcd" "Hospital" "Hospital data" 3 4
716 "vcd" "JobSatisfaction" "Job Satisfaction Data" 8 4
717 "vcd" "JointSports" "Opinions About Joint Sports" 40 5
718 "vcd" "Lifeboats" "Lifeboats on the Titanic" 18 8
719 "vcd" "NonResponse" "Non-Response Survey Data" 12 4
720 "vcd" "OvaryCancer" "Ovary Cancer Data" 16 5
721 "vcd" "PreSex" "Pre-marital Sex and Divorce" 16 5
722 "vcd" "Punishment" "Corporal Punishment Data" 36 5
723 "vcd" "RepVict" "Repeat Victimization Data" 8 9
724 "vcd" "Saxony" "Families in Saxony" 13 2
725 "vcd" "SexualFun" "Sex is Fun" 4 5
726 "vcd" "SpaceShuttle" "Space Shuttle O-ring Failures" 24 6
727 "vcd" "Suicide" "Suicide Rates in Germany" 306 6
728 "vcd" "Trucks" "Truck Accidents Data" 24 5
729 "vcd" "UKSoccer" "UK Soccer Scores" 5 6
730 "vcd" "VisualAcuity" "Visual Acuity in Left and Right Eyes" 32 4
731 "vcd" "VonBort" "Von Bortkiewicz Horse Kicks Data" 280 4
732 "vcd" "WeldonDice" "Weldon's Dice Data" 11 2
733 "vcd" "WomenQueue" "Women in Queues" 11 2


インストールとコンパイルのログ省略しないver

$ julia

_
_ _ _(_)_ | A fresh approach to technical computing
(_) | (_) (_) | Documentation: http://docs.julialang.org
_ _ _| |_ __ _ | Type "?help" for help.
| | | | | | |/ _` | |
| | |_| | | | (_| | | Version 0.4.5 (2016-03-18 00:58 UTC)
_/ |\__'_|_|_|\__'_| |
|__/ | x86_64-apple-darwin14.5.0
julia> Pkg.add("RDatasets")
INFO: Initializing package repository /Users/takao/.julia/v0.4
INFO: Cloning METADATA from git://github.com/JuliaLang/METADATA.jl
INFO: Cloning cache of BinDeps from git://github.com/JuliaLang/BinDeps.jl.git
INFO: Cloning cache of Compat from git://github.com/JuliaLang/Compat.jl.git
INFO: Cloning cache of DataArrays from git://github.com/JuliaStats/DataArrays.jl.git
INFO: Cloning cache of DataFrames from git://github.com/JuliaStats/DataFrames.jl.git
INFO: Cloning cache of Docile from git://github.com/MichaelHatherly/Docile.jl.git
INFO: Cloning cache of GZip from git://github.com/JuliaIO/GZip.jl.git
INFO: Cloning cache of RDatasets from git://github.com/johnmyleswhite/RDatasets.jl.git
INFO: Cloning cache of Reexport from git://github.com/simonster/Reexport.jl.git
INFO: Cloning cache of Rmath from git://github.com/JuliaStats/Rmath.jl.git
INFO: Cloning cache of SHA from git://github.com/staticfloat/SHA.jl.git
INFO: Cloning cache of SortingAlgorithms from git://github.com/JuliaLang/SortingAlgorithms.jl.git
INFO: Cloning cache of StatsBase from git://github.com/JuliaStats/StatsBase.jl.git
INFO: Cloning cache of StatsFuns from git://github.com/JuliaStats/StatsFuns.jl.git
INFO: Cloning cache of URIParser from git://github.com/JuliaWeb/URIParser.jl.git
INFO: Installing BinDeps v0.3.21
INFO: Installing Compat v0.8.5
INFO: Installing DataArrays v0.3.6
INFO: Installing DataFrames v0.7.5
INFO: Installing Docile v0.5.23
INFO: Installing GZip v0.2.19
INFO: Installing RDatasets v0.1.3
INFO: Installing Reexport v0.0.3
INFO: Installing Rmath v0.1.1
INFO: Installing SHA v0.1.2
INFO: Installing SortingAlgorithms v0.0.6
INFO: Installing StatsBase v0.9.0
INFO: Installing StatsFuns v0.3.0
INFO: Installing URIParser v0.1.5
INFO: Building Rmath
INFO: Package database updated

julia>
julia> using RDatasets
INFO: Precompiling module Reexport...
INFO: Precompiling module DataFrames...


実行環境

julia> versioninfo()

Julia Version 0.4.5
Commit 2ac304d* (2016-03-18 00:58 UTC)
Platform Info:
System: Darwin (x86_64-apple-darwin14.5.0)
CPU: Intel(R) Core(TM) i5-3210M CPU @ 2.50GHz
WORD_SIZE: 64
BLAS: libopenblas (DYNAMIC_ARCH NO_AFFINITY Sandybridge)
LAPACK: libopenblas
LIBM: libopenlibm
LLVM: libLLVM-3.3
julia> now()
2016-07-18T21:17:52


参考資料

Enjoy!