LoginSignup
2
3

More than 1 year has passed since last update.

ColaboratoryでRMeCabを使う

Posted at

概要

R/MeCab利用においてゼロから環境構築する場合、R/RStudio/MeCab/RMeCabを準備する必要がありWindows/Mac(Intel,M1)/32bit/64bitといったマシンの違いで苦労することがあります。

自身の環境であれば一度構築すればいいだけの話なのですが、形態素解析のハンズオンをする場合は環境構築でつまづくことになってしまいます。

そこでここではColaboratoryのNotebookを利用して手軽にR/MeCabの実行環境を用意してみたいと思います。

ノートブックでRカーネルを有効にする

ColaboratoryはPythonを実行するためのNotebook(実行環境)となりますが、PythonだけでなくRをランタイムとして利用することもできます。

以下のURLから新規でNotebookを作成すると、RをランタイムとしたNotebookを作成することができます。

https://colab.research.google.com/notebook#create=true&language=r

このURLから作成したNotebookにて、試しにtidyverseを呼び出してみます。

library(tidyverse)
iris %>%
 mutate(hoge = 1)

Untitled7-ipynb-Colaboratory.png

無事にRが実行できていて、tidyverseはinstall.packages()無しに読み込むことができていることが分かります。

Colaboratoryでは予めいくつかのライブラリが用意されているため、tidyverseもインストール無しで利用できます。

library()でインストール済パッケージを確認してみると2021年7月現在では以下のようなライブラリがインストールされていました。

library()
R packages available

Packages in library ‘/usr/local/lib/R/site-library’:

IRdisplay               'Jupyter' Display Machinery
IRkernel                Native R Kernel for the 'Jupyter Notebook'
pbdZMQ                  Programming with Big Data -- Interface to
                        'ZeroMQ'
repr                    Serializable Representations

Packages in library ‘/usr/lib/R/site-library’:

askpass                 Safe Password Entry for R, Git, and SSH
assertthat              Easy Pre and Post Assertions
backports               Reimplementations of Functions Introduced Since
                        R-3.0.0
base64enc               Tools for base64 encoding
BH                      Boost C++ Header Files
blob                    A Simple S3 Class for Representing Vectors of
                        Binary Data ('BLOBS')
brew                    Templating Framework for Report Generation
brio                    Basic R Input Output
broom                   Convert Statistical Objects into Tidy Tibbles
cachem                  Cache R Objects with Automatic Pruning
callr                   Call R from R
cellranger              Translate Spreadsheet Cell Ranges to Rows and
                        Columns
cli                     Helpers for Developing Command Line Interfaces
clipr                   Read and Write from the System Clipboard
colorspace              A Toolbox for Manipulating and Assessing Colors
                        and Palettes
commonmark              High Performance CommonMark and Github Markdown
                        Rendering in R
cpp11                   A C++11 Interface for R's C Interface
crayon                  Colored Terminal Output
credentials             Tools for Managing SSH and Git Credentials
curl                    A Modern and Flexible Web Client for R
data.table              Extension of `data.frame`
DBI                     R Database Interface
dbplyr                  A 'dplyr' Back End for Databases
desc                    Manipulate DESCRIPTION Files
devtools                Tools to Make Developing R Packages Easier
diffobj                 Diffs for R Objects
digest                  Create Compact Hash Digests of R Objects
dplyr                   A Grammar of Data Manipulation
dtplyr                  Data Table Back-End for 'dplyr'
ellipsis                Tools for Working with ...
evaluate                Parsing and Evaluation Tools that Provide More
                        Details than the Default
fansi                   ANSI Control Sequence Aware String Functions
farver                  High Performance Colour Space Manipulation
fastmap                 Fast Data Structures
forcats                 Tools for Working with Categorical Variables
                        (Factors)
fs                      Cross-Platform File System Operations Based on
                        'libuv'
gargle                  Utilities for Working with Google APIs
generics                Common S3 Generics not Provided by Base R
                        Methods Related to Model Fitting
gert                    Simple Git Client for R
ggplot2                 Create Elegant Data Visualisations Using the
                        Grammar of Graphics
gh                      'GitHub' 'API'
gitcreds                Query 'git' Credentials from 'R'
glue                    Interpreted String Literals
googledrive             An Interface to Google Drive
googlesheets4           Access Google Sheets using the Sheets API V4
gtable                  Arrange 'Grobs' in Tables
haven                   Import and Export 'SPSS', 'Stata' and 'SAS'
                        Files
highr                   Syntax Highlighting for R Source Code
hms                     Pretty Time of Day
htmltools               Tools for HTML
httr                    Tools for Working with URLs and HTTP
ids                     Generate Random Identifiers
ini                     Read and Write '.ini' Files
isoband                 Generate Isolines and Isobands from Regularly
                        Spaced Elevation Grids
jsonlite                A Simple and Robust JSON Parser and Generator
                        for R
knitr                   A General-Purpose Package for Dynamic Report
                        Generation in R
labeling                Axis Labeling
lifecycle               Manage the Life Cycle of your Package Functions
lubridate               Make Dealing with Dates a Little Easier
magrittr                A Forward-Pipe Operator for R
markdown                Render Markdown with the C Library 'Sundown'
memoise                 Memoisation of Functions
mime                    Map Filenames to MIME Types
modelr                  Modelling Functions that Work with the Pipe
munsell                 Utilities for Using Munsell Colours
openssl                 Toolkit for Encryption, Signatures and
                        Certificates Based on OpenSSL
pillar                  Coloured Formatting for Columns
pkgbuild                Find Tools Needed to Build R Packages
pkgconfig               Private Configuration for 'R' Packages
pkgload                 Simulate Package Installation and Attach
praise                  Praise Users
prettyunits             Pretty, Human Readable Formatting of Quantities
processx                Execute and Control System Processes
progress                Terminal Progress Bars
ps                      List, Query, Manipulate System Processes
purrr                   Functional Programming Tools
R6                      Encapsulated Classes with Reference Semantics
rappdirs                Application Directories: Determine Where to
                        Save Data, Caches, and Logs
rcmdcheck               Run 'R CMD check' from 'R' and Capture Results
RColorBrewer            ColorBrewer Palettes
Rcpp                    Seamless R and C++ Integration
readr                   Read Rectangular Text Data
readxl                  Read Excel Files
rematch                 Match Regular Expressions with a Nicer 'API'
rematch2                Tidy Output from Regular Expression Matching
remotes                 R Package Installation from Remote
                        Repositories, Including 'GitHub'
reprex                  Prepare Reproducible Example Code via the
                        Clipboard
rlang                   Functions for Base Types and Core R and
                        'Tidyverse' Features
rmarkdown               Dynamic Documents for R
roxygen2                In-Line Documentation for R
rprojroot               Finding Files in Project Subdirectories
rstudioapi              Safely Access the RStudio API
rversions               Query 'R' Versions, Including 'r-release' and
                        'r-oldrel'
rvest                   Easily Harvest (Scrape) Web Pages
scales                  Scale Functions for Visualization
selectr                 Translate CSS Selectors to XPath Expressions
sessioninfo             R Session Information
stringi                 Character String Processing Facilities
stringr                 Simple, Consistent Wrappers for Common String
                        Operations
svglite                 An 'SVG' Graphics Device
sys                     Powerful and Reliable Tools for Running System
                        Commands in R
systemfonts             System Native Font Finding
testthat                Unit Testing for R
tibble                  Simple Data Frames
tidyr                   Tidy Messy Data
tidyselect              Select from a Set of Strings
tidyverse               Easily Install and Load the 'Tidyverse'
tinytex                 Helper Functions to Install and Maintain TeX
                        Live, and Compile LaTeX Documents
usethis                 Automate Package and Project Setup
utf8                    Unicode Text Processing
uuid                    Tools for Generating and Handling of UUIDs
vctrs                   Vector Helpers
viridisLite             Colorblind-Friendly Color Maps (Lite Version)
waldo                   Find Differences Between R Objects
whisker                 {{mustache}} for R, Logicless Templating
withr                   Run Code 'With' Temporarily Modified Global
                        State
xfun                    Supporting Functions for Packages Maintained by
                        'Yihui Xie'
xml2                    Parse XML
xopen                   Open System Files, 'URLs', Anything
yaml                    Methods to Convert R Data to YAML and Back
zip                     Cross-Platform 'zip' Compression

Packages in library ‘/usr/lib/R/library’:

base                    The R Base Package
boot                    Bootstrap Functions (Originally by Angelo Canty
                        for S)
class                   Functions for Classification
cluster                 "Finding Groups in Data": Cluster Analysis
                        Extended Rousseeuw et al.
codetools               Code Analysis Tools for R
compiler                The R Compiler Package
datasets                The R Datasets Package
foreign                 Read Data Stored by 'Minitab', 'S', 'SAS',
                        'SPSS', 'Stata', 'Systat', 'Weka', 'dBase', ...
graphics                The R Graphics Package
grDevices               The R Graphics Devices and Support for Colours
                        and Fonts
grid                    The Grid Graphics Package
KernSmooth              Functions for Kernel Smoothing Supporting Wand
                        & Jones (1995)
lattice                 Trellis Graphics for R
MASS                    Support Functions and Datasets for Venables and
                        Ripley's MASS
Matrix                  Sparse and Dense Matrix Classes and Methods
methods                 Formal Methods and Classes
mgcv                    Mixed GAM Computation Vehicle with Automatic
                        Smoothness Estimation
nlme                    Linear and Nonlinear Mixed Effects Models
nnet                    Feed-Forward Neural Networks and Multinomial
                        Log-Linear Models
parallel                Support for Parallel computation in R
rpart                   Recursive Partitioning and Regression Trees
spatial                 Functions for Kriging and Point Pattern
                        Analysis
splines                 Regression Spline Functions and Classes
stats                   The R Stats Package
stats4                  Statistical Functions using S4 Classes
survival                Survival Analysis
tcltk                   Tcl/Tk Interface
tools                   Tools for Package Development
utils                   The R Utils Package

この中に必要なライブラリが含まれていなくてもinstall.packages()で任意のライブラリをインストールすることができます。

install.packages("nycflights13")
library(nycflights13)
airports

Untitled7-ipynb-Colaboratory (1).png

システムライブラリを入れる

今回の場合RMeCabを利用しますので、MeCab自体がマシンに入っている必要があります。そこでsystem()を利用してMeCab本体や辞書等の必要なシステムライブラリをインストールします。

system('sudo apt-get install -y mecab', intern=TRUE)
system('sudo apt-get install -y libmecab-dev', intern=TRUE)
system('sudo apt-get install -y mecab-ipadic-utf8', intern=TRUE)

ColaboratoryのOSはubuntuとなるため、apt-getでインストールすることができます。

システムライブラリを入れたのちRMeCabをインストールして読み込んでみます。

install.packages("RMeCab", repos = "https://​rmecab.jp/R") 
library(RMeCab)

res <- RMeCabC("すもももももももものうち")
res

Untitled7-ipynb-Colaboratory (2).png

名詞: 'すもも'
助詞: 'も'
名詞: 'もも'
助詞: 'も'
名詞: 'もも'
助詞: 'の'
名詞: 'うち'

という結果が返却されていて、無事にRMeCabC()での分かち書きができていました。

おわりに

Colaboratoryを実行環境とすることで、それこそ1分程度でR/MeCabによる形態素解析を始めることができます。とても便利ですね。

SpreadSheetからのデータ読み込み、データ前処理・集計あたりのスクリプトを事前に用意しておけば、非分析職に対しても気軽なテキストマイニング環境を提供できるかもしれません。

参考URL

2
3
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
2
3