0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 1 year has passed since last update.

RPKMとTPM

Last updated at Posted at 2023-01-06

RPKM (reads per kilobase per million mapped reads) とTPM (transcripts per million) はいずれも次世代シーケンサー (Next Generation Sequencer, NGS) で測定したカウントデータの正規化法です。

参考ページ:

縦方向に転写産物、横方向にサンプルが並んでいるとします。列毎に正規化してから行毎に正規化するのがRPKM、行毎に正規化してから列毎に正規化するのがTPMです。

列毎の正規化は、サンプル毎のリード数の違いを揃えるために行います。行毎の正規化は、転写産物毎の遺伝子長の違いを揃えるために行います。

Pythonのコードを以下に示します。カウントデータ (2次元配列、データフレーム) と遺伝子長データ (1次元配列) を用意します。

# INPUT:
#   count_df: count data (row: transcripts, columns: samples)
#   len_sr: length data (1-dim sequence (pd.Series, np.Array, etc.))
# OUTPUT:
#   rpkm_df: RPKM data
#   tpm_df: TPM data

# count > RPKM
rpkm_df = count_df / count_df.sum()
rpkm_df = rpkm_df.divide(len_sr, axis=0)
rpkm_df *= 10**9

# count > TPM
tpm_df = count_df.divide(len_sr, axis=0)
tpm_df = tpm_df / tpm_df.sum()
tpm_df *= 10**6
0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?