4
4

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 5 years have passed since last update.

添削希望っぽいのでやってみる #rstatsj

Last updated at Posted at 2015-04-22

@uri さんのこれだけど

添削希望??みたいなのでやってみます。

添削前
screen.name <- html("https://twitter.com/teramonagi/lists/list/members") %>% 
  html_nodes(css = ".username") %>% 
  html_text() %>% 
  unique() %>% 
  grep(pattern = "^[[:punct:]][[:graph:]]", value = TRUE) %>% 
  substr(2, 100)
screen.name
添削後
screen.name <- html("https://twitter.com/teramonagi/lists/list/members") %>% 
  html_nodes(xpath = '//*[@id="stream-items-id"]/li/div/div[2]/div/a/span[1]') %>% 
  html_text %>%
  substring(2)
screen.name

ちゃんとスクレイピングすれば変に grep とかやらないで済みます。

添削前
for(i in 1:length(screen.name)){
  html(paste("https://twitter.com/", screen.name[i], sep = "")) %>% 
  html_nodes(css = ".ProfileAvatar-image") %>% 
  html_attr("src") %>% 
  download.file(destfile = paste("img_hxm/", screen.name[i], "_", basename(.), sep = ""), method = "curl")
}
# 親分へ。これをpfroeachでやるにはどうすればよいでしょうか... 
## 失敗例
# pforeach(i in 1:length(screen.name))({
#   html(paste("https://twitter.com/", screen.name[i], sep = "")) %>% 
#   html_nodes(css = ".ProfileAvatar-image") %>% 
#   html_attr("src") %>% 
#   download.file(destfile = paste("img_hxm/", basename(.), sep = ""), method = "curl")
# })
添削後
npforeach(id = screen.name)({
  cat(id, "\n")
  url <- sprintf("https://twitter.com/%s", id)
  img_url <- html(url) %>% 
    html_nodes(css = ".ProfileAvatar-image") %>% 
    html_attr("src")
  destfile <- file.path("img_hxm", sprintf("%s_%s", id, basename(img_url)))
  img_url %>% download.file(destfile = destfile, mode="wb")
  invisible()
})

pforeach では in でなく = を使います。
これぐらいだと並列処理するほうがコストがかかると思うので npforeach を使ってます。
好みの問題ですが、paste より sprintf を使ったほうがすっきりします。
どうしても paste を使いたい場合は paste0 を使うのが便利かと。

添削前
prof.file <- list.files(path = "img_hxm")
pal <- extract_colours(url_img = paste("img_hxm/", prof.file[15], sep = ""))
pal
添削後
prof.file <- list.files("img_hxm", full.names = TRUE, pattern = sprintf("^%s", screen.name[15]))
pal <- extract_colours(url_img = prof.file)
pal

list.files には full.names という引数があります。
また、screen.nameprof.file の順番が必ずしも対応するわけではないため、pattern 引数で目的の ID から始まるファイルを指定してやります。

添削前
data.frame(name = rep(screen.name[15], each = 5), value = rep(20, 5), pal) %>% 
ggplot(., aes(x ="", y = value, fill = pal)) + 
  geom_bar(stat = "identity") + coord_polar("y") +
  scale_fill_manual(values = pal) + 
  guides(fill = FALSE)
添削後
data <- data.frame(name = screen.name[15], value = 100/length(pal), pal)

ggplot(data, aes(x ="", y = value, fill = pal)) + 
  geom_bar(stat = "identity") + coord_polar("y") +
  scale_fill_manual(values = pal) + 
  guides(fill = FALSE)

わざわざ rep とかしなくても長いほうに合わせてくれます。
好みの問題だけど、パイプと ggplot は分けたほうが良いのでは。

rPlotter 入れるの面倒くさそうだったので実行はしてません。

ところで screen.name[15] は dichika さんみたいだけど、screen.name[14] のまちがい?

Enjoy!

4
4
1

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
4
4

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?