前回、こういう話をした。

この問題に対して、ユタ兄さんが別の解法を示している。

RPubs - reverse depends of ggplot2 https://t.co/NSlunFHhRf

— Hiroaki Yutani (@yutannihilation) 2015, 4月 16

ユタ兄さんの方はパッケージダウンロード数によるランキングができているので、上から見ていけば良いという点が素晴らしい。

一方、私の方は title と description が取れている。

では、これらのデータを結合すればより素晴らしい結果になるのではないか。

```
library(XML)
library(dplyr)
url <- "http://rstudio-pubs-static.s3.amazonaws.com/73196_20878064ec7b4e12aca3416ae18acdf0.html"
dl_data <- readHTMLTable(url, which=1, stringsAsFactors=FALSE)
data <- read.csv("ggplot2_extend.csv") %>%
merge(dl_data, by.x="name", by.y="package", all = T) %>%
mutate(count=as.numeric(count)) %>%
arrange(desc(count))
knitr::kable(head(data, 3), format = "html")
```

name | title | desc | count |
---|---|---|---|

Hmisc | Hmisc: Harrell Miscellaneous | Contains many functions useful for data analysis, high-level graphics, utility operations, functions for computing sample size and power, importing and annotating datasets, imputing missing values, advanced table making, variable clustering, character string manipulation, conversion of R objects to LaTeX code, and recoding variables. | 21222 |

caret | caret: Classification and Regression Training | Misc functions for training and plotting classification and regression models | 10356 |

ggmap | ggmap: Spatial Visualization with Google Maps and OpenStreetMap | Easily visualize of spatial data and models on top of Google Maps, OpenStreetMaps, Stamen Maps, or CloudMade Maps with ggplot2. | 6329 |

どうやらこれだと、たんに結果の表示に ggplot2 を使っているだけのパッケージも取れてしまうようだ。

もともとの目的は ggplot2 の拡張パッケージを取得することだったので、この結果から

- パッケージ名が gg もしくは GG から始まる
- title もしくは description に ggplot2 という言葉を含む

ものだけをフィルタリングしてみよう。

```
library(stringr)
result <- data %>%
filter(str_detect(name, "^(gg|GG)") | str_detect(title, "ggplot2") | str_detect(desc, "ggplot2"))
knitr::kable(result, format = "html")
```

name | title | desc | count |
---|---|---|---|

ggmap | ggmap: Spatial Visualization with Google Maps and OpenStreetMap | Easily visualize of spatial data and models on top of Google Maps, OpenStreetMaps, Stamen Maps, or CloudMade Maps with ggplot2. | 6329 |

GGally | GGally: Extension to ggplot2 | GGally is designed to be a helper to ggplot2. It contains templates for different plots to be combined into a plot matrix, a parallel coordinate plot function, as well as a function for making a network plot. | 3298 |

ggdendro | ggdendro: Tools for extracting dendrogram and tree diagram plot data for use with ggplot2 | This is a set of tools for dendrograms and tree plots using ggplot2. The ggplot2 philosophy is to clearly separate data from the presentation. Unfortunately the plot method for dendrograms plots directly to a plot device without exposing the data. The ggdendro package resolves this by making available functions that extract the dendrogram plot data. | 2127 |

ggthemes | ggthemes: Extra Themes, Scales and Geoms for 'ggplot2' | Some extra themes, geoms, and scales for 'ggplot2'. | 1553 |

pitchRx | pitchRx: Tools for Harnessing MLBAM Gameday Data and Visualizing PITCHf/x | With pitchRx, one can easily obtain Major League Baseball Advanced Media's Gameday data (as well as store it in a remote database). The Gameday website hosts a wealth of data in XML format, but perhaps most interesting is PITCHf/x. Among other things, PITCHf/x data can be used to recreate a baseball's flight path from a pitcher's hand to home plate. With pitchRx, one can easily create animations and interactive 3D scatterplots of the baseball's flight path. PITCHf/x data is also commonly used to generate a static plot of baseball locations at the moment they cross home plate. These plots, sometimes called strike-zone plots, can also refer to a plot of event probabilities over the same region. pitchRx provides an easy and robust way to generate strike-zone plots using the ggplot2 package. | 461 |

RcmdrPlugin.KMggplot2 | RcmdrPlugin.KMggplot2: An Rcmdr Plug-In for Kaplan-Meier Plots and Other Plots by Using the ggplot2 Package | This package is an R Commander plug-in for Kaplan-Meier plots and other plots by using the ggplot2 package. | 358 |

eeptools | eeptools: Convenience functions for education data | Collection of convenience functions to make working with administrative records easier and more consistent. Includes functions to clean strings, identify cutpoints, and quickly combine shapefiles and dataframes for plotting. Includes four alternative themes for ggplot2 as well as a wrapper for exporting graphics for inclusion in MS Office products. Also includes three example datasets of administrative education records for learning how to process records with errors. | 347 |

ggparallel | ggparallel: Variations of Parallel Coordinate Plots for Categorical Data | R package for creating hammock plots, parallel sets, and common angle plots using the ggplot2 framework. | 308 |

ggsubplot | ggsubplot: Explore complex data by embedding subplots within plots | ggsubplot makes it easy to embed customized subplots within larger graphics. Subplots may be used as a geom to explore interaction effects, spatial data, and hierarchical data. Subplots can also be used to explore big data without overplotting. | 308 |

ggmcmc | ggmcmc: Graphical tools for analyzing Markov Chain Monte Carlo simulations from Bayesian inference | ggmcmc is a tool for assessing and diagnosing convergence of Markov Chain Monte Carlo simulations, as well as for graphically display results from full MCMC analysis. The package also facilitates the graphical interpretation of models by providing flexible functions to plot the results against observed variables. | 300 |

popgraph | popgraph: This is an R package that constructs and manipulates population graphs | This is a generic package that produces "Population Graphs" a graph-theoretic topology based upon conditional genetic covariance. This functionality used to be within the gstudio package, but has been taken out to focus on spatial integration of graph topologies with existing packages like sp, raster, and ggplot2. | 293 |

PairedData | PairedData: Paired Data Analysis | This package provides many datasets and a set of graphics (based on ggplot2), statistics, effect sizes and hypothesis tests for analysing paired data with S4 class. | 288 |

ggRandomForests | ggRandomForests: Visually Exploring Random Forests | Graphic elements for exploring Random Forests using the randomForestSRC package for survival, regression and classification forests and ggplot2 package plotting. | 287 |

xkcd | xkcd: Plotting ggplot2 graphics in an XKCD style | This package allows the representation of ggplot2 graphs using the XKCD style. | 262 |

COPASutils | COPASutils: Tools for processing COPAS large-particle flow cytometer data | A logical workflow for the reading, processing, and visualization of data obtained from the Union Biometrica Complex Object Parametric Analyzer and Sorter (COPAS) platform large-particle flow cytometers and a powerful suite of functions for the rapid processing and analysis of large high-throughput screening data sets. It combines the speed of dplyr with the elegance of ggplot2 to make analysis of COPAS data fast and painless. | 249 |

ggROC | ggROC: package for roc curve plot with ggplot2 | package for roc curve plot with ggplot2 | 229 |

granovaGG | granovaGG: Graphical Analysis of Variance Using ggplot2 | This collection of functions in granovaGG provides what we call elemental graphics for display of anova results. The term elemental derives from the fact that each function is aimed at construction of graphical displays that afford direct visualizations of data with respect to the fundamental questions that drive the particular anova methods. This package represents a modification of the original granova package; the key change is to use ggplot2, Hadley Wickham's package based on Grammar of Graphics concepts (due to Wilkinson). The main function is granovagg.1w (a graphic for one way anova); two other functions (granovagg.ds and granovagg.contr) are to construct graphics for dependent sample analyses and contrast-based analyses respectively. (The function granova.2w, which entails dynamic displays of data, is not currently part of granovaGG.) The granovaGG functions are to display data for any number of groups, regardless of their sizes (however, very large data sets or numbers of groups can be problematic). For granovagg.1w a specialized approach is used to construct data-based contrast vectors for which anova data are displayed. The result is that the graphics use a straight line to facilitate clear interpretations while being faithful to the standard effect test in anova. The graphic results are complementary to standard summary tables; indeed, numerical summary statistics are provided as side effects of the graphic constructions. granovagg.ds and granovagg.contr provide graphic displays and numerical outputs for a dependent sample and contrast-based analyses. The graphics based on these functions can be especially helpful for learning how the respective methods work to answer the basic question(s) that drive the analyses. This means they can be particularly helpful for students and non-statistician analysts. But these methods can be of assistance for work-a-day applications of many kinds, as they can help to identify outliers, clusters or patterns, as well as highlight the role of non-linear transformations of data. In the case of granovagg.1w and granovagg.ds several arguments are provided to facilitate flexibility in the construction of graphics that accommodate diverse features of data, according to their corresponding display requirements. See the help files for individual functions. | 224 |

PKreport | PKreport: A reporting pipeline for checking population pharmacokinetic model assumption | PKreport aims to 1) provide automatic pipeline for users to visualize data and models. It creates a flexible R framework with automatically generated R scripts to save time and cost for later usage; 2) implement an archive-oriented management tool for users to store, retrieve and modify figures. 3) offer powerful and convenient service to generate high-quality graphs based on two R packages: lattice and ggplot2. | 224 |

ggExtra | ggExtra: Collection of Functions and Layers to Enhance ggplot2 | Collection of functions and layers to enhance ggplot2. | 222 |

bdscale | bdscale: Remove Weekends and Holidays From ggplot2 Axes | Provides a continuous date scale, omitting weekends and holidays. | 213 |

ggswissmaps | ggswissmaps: Offers Various Swiss Maps as ggplot2 Objects | Offers various swiss maps as ggplot2 objects and gives the possibility to add layers of data on the maps. Data are publicly available from the swiss federal statistical office. | 202 |

MCMC.OTU | MCMC.OTU: Bayesian analysis of multivariate counts data | This package implements poisson-lognormal generalized linear mixed model analysis of multivariate counts data using MCMC, aiming to infer the changes in relative proportions of individual variables. The package was originally designed for sequence-based analysis of microbial communities ("metabarcoding", variables = operational taxonomic units, OTUs), but can be used for other types of multivariate counts, such as in ecological applications (variables = species). The results are summarized and plotted using ggplot2 functions. Includes functions to remove sample and variable outliers and reformat counts into normalized log-transformed values for correlation and principal component/coordinate analysis. Walkthrough and examples: http://www.bio.utexas.edu/research/matz_lab/matzlab/Methods_files/walkthroughExample_mcmcOTU_R.txt | 201 |

gapmap | gapmap: Functions for Drawing Gapped Cluster Heatmap with ggplot2 | The gap encodes the distance between clusters and improves interpretation of cluster heatmaps. The gaps can be of the same distance based on a height threshold to cut the dendrogram. Another option is to vary the size of gaps based on the distance between clusters. | 199 |

ggenealogy | ggenealogy: Visualization Tools for Genealogical Data | Methods for searching through genealogical data and displaying the results. Plotting algorithms assist with data exploration and publication-quality image generation. Uses the Grammar of Graphics. | 199 |

Rz | Rz: GUI Tool for Data Management like SPSS or Stata | R is very powerful but it lacks some of the functionalities found in Stata or SPSS to manage survey data. The 'memisc' package provides these (variable labels, value labels, definable missing values and so on), but to efficiently work these functions need a graphical interface to allow the user to get an overview of the data. This package provides such a graphical interface, similar in fashion to SPSS's Variable View and data managing system. It uses the 'memisc' package as its backend. Additionally, 'Rz' has a powerful plot assistant interface based on 'ggplot2'. | 192 |

PKgraph | PKgraph: Model diagnostics for population pharmacokinetic models | PKgraph provides a graphical user interface for population pharmacokinetic model diagnosis. It also provides an integrated and comprehensive platform for the analysis of pharmacokinetic data including exploratory data analysis, goodness of model fit, model validation and model comparison. Results from a variety of modeling fitting software, including NONMEM, Monolix, SAS and R, can be used. PKgraph is programmed in R, and uses the R packages lattice, ggplot2 for static graphics, and rggobi for interactive graphics. | 187 |

mapDK | mapDK: Maps of Denmark | Create static choropleth maps of Denmark using the ggplot2 package. | 179 |

orgR | orgR: Analyse Text Files Created by Emacs' Org mode | Provides functionality to process text files created by Emacs' Org mode, and decompose the content to the smallest components (headlines, body, tag, clock entries etc). Emacs is an extensible, customizable text editor and Org mode is for keeping notes, maintaining TODO lists, planning projects. Allows users to analyze org files as data frames in R, e.g., to convieniently group tasks by tag into project and calculate total working hours. Also provides some help functions like search.parent, gg.pie (visualise working hours in ggplot2) and tree.headlines (visualise headline stricture in tree format) to help user managing their complex org files. | 171 |

vdmR | vdmR: Visual Data Mining Tools for R | This provides web based visual data mining tools by adding interactive functions to ggplot2 graphics. Brushing and linking between the multiple plots is one of the main feature of this package. Currently scatter plot, histogram, parallel coordinate plot and choropleth map are supported. | 145 |

ggtern | ggtern: An Extension to 'ggplot2', for the Creation of Ternary Diagrams | Extends the functionality of ggplot2, providing the capability to plot ternary diagrams for (subset of) the ggplot2 geometries. Additionally, ggtern has implemented several NEW geometries which are unavailable to the standard ggplot2 release. For further examples and documentation, please proceed to the ggtern website. | 18 |

30 個のパッケージが取得できた。

### メモ

今回のプロジェクトの結果は発起人である @kazutan さんの github リポジトリで管理されるようだ。

Awesome!