Help us understand the problem. What is going on with this article?

Tidyっぽいのに、列に複数のデータが区切り文字でわけられて入っている状態を処理する

More than 1 year has passed since last update.

メモ程度のものです. 以下を参考にしました。

https://notchained.hatenablog.com/entry/2016/07/01/001836
http://d.hatena.ne.jp/dichika/20141027/p1

問題

blast2goやorthofinderを使っていると、tsv形式なのだが、1つの列に複数のデータが区切られて入っている形式で出力される。

gene mus homo fly
atpA mus1,mus3,mus8 homo2,homo3 fly1,fly2
cox1 mus2 homo1,homo4 fly3,fly4
POL1 mus4,mus5,mus6,mus7 homo5 fly5
demo.txt
gene    mus homo    fly
atpA    mus1,mus3,mus8  homo2,homo3 fly1,fly2
cox1    mus2    homo1,homo4 fly3,fly4
POL1    mus4,mus5,mus6,mus7 homo5   fly5

これを Tidyなデータに変えたい or 再度この形式に戻したい時がある

解決法

non-tidy → tidy

可変長なので処理はちょっとめんどくさい。
多分もっといい方法がある

nontidy_tidy.R
library(tidyverse) 

OF_raw <-
  'demo.txt'  %>%
  read_tsv() 

x <- seq(from=1, to=10000, by=1) #区切り文字(ここでは",")で分割した文字を格納するための一時的な配列を作る。to=の数字は一列にどれだけのデータが入っているかに応じて調節する。大きいと重くなる

mus_OF_tidy <-
OF_raw %>%
dplyr::select(gene, mus) %>%
tidyr::separate(mus, as.character(x), sep=',', remove=TRUE, convert=FALSE, extra='warn', fill='warn') %>%
gather(key, value, -gene) %>%
select(gene, value) %>%
rename(mus=value)%>%
drop_na()

write_tsv(mus_OF_tidy, "mus_OF_tidy.txt")

関数化しておく

sep_function.R
my_sep <- function(data,Sep) #data にデータ Sepに区切り文字を入れる。
{
x <- seq(from=1, to=1000, by=1) 

data_tidy <-
  data %>%
  tidyr::separate(GO, as.character(x), sep=Sep, remove=TRUE, convert=FALSE, extra='warn', fill='warn') %>%
  gather(key, value, -ID) %>%
  select(ID, value) %>%
  rename(GO=value)%>%
  drop_na()

return(data_tidy)
}

tidy → non-tidy

こっちはもうちょっとスマート

nontidy_tidy.R
library(tidyverse) 

OF_raw <-
  'mus_OF_tidy.txt'  %>%
  read_tsv() 

OF_raw %>%
  group_by(gene)%>%
  dplyr::summarise_all(str_flatten, collapse=",")

Why do not you register as a user and use Qiita more conveniently?
  1. We will deliver articles that match you
    By following users and tags, you can catch up information on technical fields that you are interested in as a whole
  2. you can read useful information later efficiently
    By "stocking" the articles you like, you can search right away
Comments
Sign up for free and join this conversation.
If you already have a Qiita account
Why do not you register as a user and use Qiita more conveniently?
You need to log in to use this function. Qiita can be used more conveniently after logging in.
You seem to be reading articles frequently this month. Qiita can be used more conveniently after logging in.
  1. We will deliver articles that match you
    By following users and tags, you can catch up information on technical fields that you are interested in as a whole
  2. you can read useful information later efficiently
    By "stocking" the articles you like, you can search right away