1
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 5 years have passed since last update.

重複keyがあるデータから最小・最大値を含む行をtidyverseで選び出す

Posted at

メモ程度のものです.基礎的だけど、よく忘れるので。

課題

Blastでは、同一queryに対して複数hitするのが通常。その中から、E-valueが最小のhitをそれぞれのqueryに対して取り出したいという時がある。

解決法

queryでgroup_by()してから、filter()で取り出す。filterの条件をmin()で指定すればOK

-outfmt 6で出力したblast結果ファイルを前提とすると、以下でできる。

tophit.R

library(tidyverse)


# ファイル読み込み
blast_out<-
read_tsv("blast.out.fmt6.txt", col_names = c(
"qseqid",
"sseqid",
"pident",
"length",
"mismatch",
"gapopen",
"qstart",
"qend",
"sstart",
"send",
"evalue",
"bitscore"
))

blast_out_tophit<-
blast_out %>%
  group_by(qseqid) %>% #クエリでグルーピングする
  filter(evalue==min(evalue))%>% #evalueが最小のものと同じ行を取り出す(evalueが最小の行を取り出す)
  filter(bitscore==max(bitscore)) %>% #bitscoreが最大のものと同じ行を取り出す(bitscoreが最大の行を取り出す)
  filter(pident>80)

こんな感じ

1
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
1
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?