概要
R/MeCab利用においてゼロから環境構築する場合、R/RStudio/MeCab/RMeCabを準備する必要がありWindows/Mac(Intel,M1)/32bit/64bitといったマシンの違いで苦労することがあります。
自身の環境であれば一度構築すればいいだけの話なのですが、形態素解析のハンズオンをする場合は環境構築でつまづくことになってしまいます。
そこでここではColaboratoryのNotebookを利用して手軽にR/MeCabの実行環境を用意してみたいと思います。
ノートブックでRカーネルを有効にする
ColaboratoryはPythonを実行するためのNotebook(実行環境)となりますが、PythonだけでなくRをランタイムとして利用することもできます。
以下のURLから新規でNotebookを作成すると、RをランタイムとしたNotebookを作成することができます。
https://colab.research.google.com/notebook#create=true&language=r
このURLから作成したNotebookにて、試しにtidyverseを呼び出してみます。
library(tidyverse)
iris %>%
mutate(hoge = 1)
無事にRが実行できていて、tidyverseはinstall.packages()
無しに読み込むことができていることが分かります。
Colaboratoryでは予めいくつかのライブラリが用意されているため、tidyverseもインストール無しで利用できます。
library()
でインストール済パッケージを確認してみると2021年7月現在では以下のようなライブラリがインストールされていました。
library()
R packages available
Packages in library ‘/usr/local/lib/R/site-library’:
IRdisplay 'Jupyter' Display Machinery
IRkernel Native R Kernel for the 'Jupyter Notebook'
pbdZMQ Programming with Big Data -- Interface to
'ZeroMQ'
repr Serializable Representations
Packages in library ‘/usr/lib/R/site-library’:
askpass Safe Password Entry for R, Git, and SSH
assertthat Easy Pre and Post Assertions
backports Reimplementations of Functions Introduced Since
R-3.0.0
base64enc Tools for base64 encoding
BH Boost C++ Header Files
blob A Simple S3 Class for Representing Vectors of
Binary Data ('BLOBS')
brew Templating Framework for Report Generation
brio Basic R Input Output
broom Convert Statistical Objects into Tidy Tibbles
cachem Cache R Objects with Automatic Pruning
callr Call R from R
cellranger Translate Spreadsheet Cell Ranges to Rows and
Columns
cli Helpers for Developing Command Line Interfaces
clipr Read and Write from the System Clipboard
colorspace A Toolbox for Manipulating and Assessing Colors
and Palettes
commonmark High Performance CommonMark and Github Markdown
Rendering in R
cpp11 A C++11 Interface for R's C Interface
crayon Colored Terminal Output
credentials Tools for Managing SSH and Git Credentials
curl A Modern and Flexible Web Client for R
data.table Extension of `data.frame`
DBI R Database Interface
dbplyr A 'dplyr' Back End for Databases
desc Manipulate DESCRIPTION Files
devtools Tools to Make Developing R Packages Easier
diffobj Diffs for R Objects
digest Create Compact Hash Digests of R Objects
dplyr A Grammar of Data Manipulation
dtplyr Data Table Back-End for 'dplyr'
ellipsis Tools for Working with ...
evaluate Parsing and Evaluation Tools that Provide More
Details than the Default
fansi ANSI Control Sequence Aware String Functions
farver High Performance Colour Space Manipulation
fastmap Fast Data Structures
forcats Tools for Working with Categorical Variables
(Factors)
fs Cross-Platform File System Operations Based on
'libuv'
gargle Utilities for Working with Google APIs
generics Common S3 Generics not Provided by Base R
Methods Related to Model Fitting
gert Simple Git Client for R
ggplot2 Create Elegant Data Visualisations Using the
Grammar of Graphics
gh 'GitHub' 'API'
gitcreds Query 'git' Credentials from 'R'
glue Interpreted String Literals
googledrive An Interface to Google Drive
googlesheets4 Access Google Sheets using the Sheets API V4
gtable Arrange 'Grobs' in Tables
haven Import and Export 'SPSS', 'Stata' and 'SAS'
Files
highr Syntax Highlighting for R Source Code
hms Pretty Time of Day
htmltools Tools for HTML
httr Tools for Working with URLs and HTTP
ids Generate Random Identifiers
ini Read and Write '.ini' Files
isoband Generate Isolines and Isobands from Regularly
Spaced Elevation Grids
jsonlite A Simple and Robust JSON Parser and Generator
for R
knitr A General-Purpose Package for Dynamic Report
Generation in R
labeling Axis Labeling
lifecycle Manage the Life Cycle of your Package Functions
lubridate Make Dealing with Dates a Little Easier
magrittr A Forward-Pipe Operator for R
markdown Render Markdown with the C Library 'Sundown'
memoise Memoisation of Functions
mime Map Filenames to MIME Types
modelr Modelling Functions that Work with the Pipe
munsell Utilities for Using Munsell Colours
openssl Toolkit for Encryption, Signatures and
Certificates Based on OpenSSL
pillar Coloured Formatting for Columns
pkgbuild Find Tools Needed to Build R Packages
pkgconfig Private Configuration for 'R' Packages
pkgload Simulate Package Installation and Attach
praise Praise Users
prettyunits Pretty, Human Readable Formatting of Quantities
processx Execute and Control System Processes
progress Terminal Progress Bars
ps List, Query, Manipulate System Processes
purrr Functional Programming Tools
R6 Encapsulated Classes with Reference Semantics
rappdirs Application Directories: Determine Where to
Save Data, Caches, and Logs
rcmdcheck Run 'R CMD check' from 'R' and Capture Results
RColorBrewer ColorBrewer Palettes
Rcpp Seamless R and C++ Integration
readr Read Rectangular Text Data
readxl Read Excel Files
rematch Match Regular Expressions with a Nicer 'API'
rematch2 Tidy Output from Regular Expression Matching
remotes R Package Installation from Remote
Repositories, Including 'GitHub'
reprex Prepare Reproducible Example Code via the
Clipboard
rlang Functions for Base Types and Core R and
'Tidyverse' Features
rmarkdown Dynamic Documents for R
roxygen2 In-Line Documentation for R
rprojroot Finding Files in Project Subdirectories
rstudioapi Safely Access the RStudio API
rversions Query 'R' Versions, Including 'r-release' and
'r-oldrel'
rvest Easily Harvest (Scrape) Web Pages
scales Scale Functions for Visualization
selectr Translate CSS Selectors to XPath Expressions
sessioninfo R Session Information
stringi Character String Processing Facilities
stringr Simple, Consistent Wrappers for Common String
Operations
svglite An 'SVG' Graphics Device
sys Powerful and Reliable Tools for Running System
Commands in R
systemfonts System Native Font Finding
testthat Unit Testing for R
tibble Simple Data Frames
tidyr Tidy Messy Data
tidyselect Select from a Set of Strings
tidyverse Easily Install and Load the 'Tidyverse'
tinytex Helper Functions to Install and Maintain TeX
Live, and Compile LaTeX Documents
usethis Automate Package and Project Setup
utf8 Unicode Text Processing
uuid Tools for Generating and Handling of UUIDs
vctrs Vector Helpers
viridisLite Colorblind-Friendly Color Maps (Lite Version)
waldo Find Differences Between R Objects
whisker {{mustache}} for R, Logicless Templating
withr Run Code 'With' Temporarily Modified Global
State
xfun Supporting Functions for Packages Maintained by
'Yihui Xie'
xml2 Parse XML
xopen Open System Files, 'URLs', Anything
yaml Methods to Convert R Data to YAML and Back
zip Cross-Platform 'zip' Compression
Packages in library ‘/usr/lib/R/library’:
base The R Base Package
boot Bootstrap Functions (Originally by Angelo Canty
for S)
class Functions for Classification
cluster "Finding Groups in Data": Cluster Analysis
Extended Rousseeuw et al.
codetools Code Analysis Tools for R
compiler The R Compiler Package
datasets The R Datasets Package
foreign Read Data Stored by 'Minitab', 'S', 'SAS',
'SPSS', 'Stata', 'Systat', 'Weka', 'dBase', ...
graphics The R Graphics Package
grDevices The R Graphics Devices and Support for Colours
and Fonts
grid The Grid Graphics Package
KernSmooth Functions for Kernel Smoothing Supporting Wand
& Jones (1995)
lattice Trellis Graphics for R
MASS Support Functions and Datasets for Venables and
Ripley's MASS
Matrix Sparse and Dense Matrix Classes and Methods
methods Formal Methods and Classes
mgcv Mixed GAM Computation Vehicle with Automatic
Smoothness Estimation
nlme Linear and Nonlinear Mixed Effects Models
nnet Feed-Forward Neural Networks and Multinomial
Log-Linear Models
parallel Support for Parallel computation in R
rpart Recursive Partitioning and Regression Trees
spatial Functions for Kriging and Point Pattern
Analysis
splines Regression Spline Functions and Classes
stats The R Stats Package
stats4 Statistical Functions using S4 Classes
survival Survival Analysis
tcltk Tcl/Tk Interface
tools Tools for Package Development
utils The R Utils Package
この中に必要なライブラリが含まれていなくてもinstall.packages()
で任意のライブラリをインストールすることができます。
install.packages("nycflights13")
library(nycflights13)
airports
システムライブラリを入れる
今回の場合RMeCabを利用しますので、MeCab自体がマシンに入っている必要があります。そこでsystem()
を利用してMeCab本体や辞書等の必要なシステムライブラリをインストールします。
system('sudo apt-get install -y mecab', intern=TRUE)
system('sudo apt-get install -y libmecab-dev', intern=TRUE)
system('sudo apt-get install -y mecab-ipadic-utf8', intern=TRUE)
ColaboratoryのOSはubuntuとなるため、apt-getでインストールすることができます。
システムライブラリを入れたのちRMeCabをインストールして読み込んでみます。
install.packages("RMeCab", repos = "https://rmecab.jp/R")
library(RMeCab)
res <- RMeCabC("すもももももももものうち")
res
名詞: 'すもも'
助詞: 'も'
名詞: 'もも'
助詞: 'も'
名詞: 'もも'
助詞: 'の'
名詞: 'うち'
という結果が返却されていて、無事にRMeCabC()
での分かち書きができていました。
おわりに
Colaboratoryを実行環境とすることで、それこそ1分程度でR/MeCabによる形態素解析を始めることができます。とても便利ですね。
SpreadSheetからのデータ読み込み、データ前処理・集計あたりのスクリプトを事前に用意しておけば、非分析職に対しても気軽なテキストマイニング環境を提供できるかもしれません。
参考URL