ããã«ã¡ã¯ã
ãã€ãªã€ã³ãã©ããã£ã¯ã¹ãã¿ã§ããTCGAã®å
¬éããŠããçªç¶å€ç°ã®ããŒã¿ã«NMF(éè² å€è¡åå ååè§£)ããããããå€ç°ã®ã·ã°ããã£ãŒããšããããªïŒãšåçŽã«èããŠãã£ãŠã¿ããšãããå²ãšããŸããããŸããã®ã§èšäºã«ããŸãã
æ°ã¥ããããšã¯ã³ã¡ã³ããç·šéãªã¯ãšã¹ãã§ææããŠé ãããšããããã§ãã
ã
ã
ã
ã
ïŒäœã®ããŽãããããŸããïŒ æ£è§£ã¯äžçªæåŸïŒ
ð£ ãåºç€ç¥èç·šã
ðœ å€ç°ã·ã°ããã£ãŒã£ãŠãªã«ïŒ
ãã现èã®éºäŒåå€ç°ã«ã¯ãã¿ãŒã³ããã
ãããã¯éºäŒåã®ç
æ°ã§ãããšèšãããŠããŸãããã现èã«ã¯å€æ°ã®éºäŒåå€ç°ãããã墿®ãå
ç«ã®åé¿ãç¬èªã®ä»£è¬ãªã©ããã现èãšããŠã®æ§è³ªãç²åŸããŠãããšãããŸããçºããã«ããããŸã§ã«çްèã¯ããŸããŸãªéºäŒåå€ç°ãèç©ããŸããéºäŒåå€ç°ã®åå ãšããŠã¿ãã³ã»çŽ«å€ç·ãªã©ã®çºããç©è³ªãæåã§ãããDNAã®æå·ã修埩ããæ©æ§ã®ç Žç¶»ã圱é¿ããŠãããšãããŸãã
ãæ£åžžçŽ°èãããã现èãã§ãããŸã§ã«ã¯é·ãæéããããŠç°å¢ã«é©ãã现èãéžæãããŠããå¿
èŠããããŸãã
ãèç©ãããå€ç°ã®é »åºŠã調ã¹ãŠãã¿ãã³ã«ã¯ã¿ãã³ã玫å€ç·ã«ã¯çŽ«å€ç·ã®ç¹åŸŽçãªãã¿ãŒã³ãæ¢ãã®ãã·ã°ããã£ãŒã®èãæ¹ã§ãã
éºäŒåå€ç°ã®ãã¿ãŒã³ãåè§£ãã
ããã现èã®çªç¶å€ç°ã¯è€æ°ã®åå ãçµã¿åããã£ãŠçºçããŠããŸãããããåè§£ããããã«NMFïŒéè² å€è¡åå ååè§£ïŒãšããææ³ã䜿ããŸãã
Single Base Substitution (SBS) Signatures
ãå€ç°ã·ã°ããã£ãŒã®ãªãã§ãSingle Base Substitution (SBS) Signaturesãæãæåã§ãããã®ä»ã«ãã·ã°ããã£ãŒã¯ããããããããã§ãããç§ã¯è©³ãããããŸããã
ããã®ã·ã°ããã£ãŒã¯ç¹å€ç°ãšããã®å·Šå³ã®å¡©åºãã«ãŠã³ãããŸãã
- å¡©åºã®å€ç°ã®ãã¿ãŒã³
- 5'æ«ç«¯åŽã®å¡©åº
- 3'æ«ç«¯åŽã®å¡©åº
ã®å Žåã®æ°ã¯å
šéšã§48éããããŸãããã®48éãã«ã€ããŠé »åºŠãã«ãŠã³ãããŠè¡åãäœã£ãŠã·ã°ããã£ãŒãšããŠå®çŸ©ããŸãã
ïŒWikipedia https://en.wikipedia.org/wiki/Mutational_signatures ããïŒ
ããã ãã§ã¯ããããããªããšæãã®ã§ã次ã«å ·äœäŸã瀺ããŸãã
ð COSMICã®å€ç°ã·ã°ããã£ãŒããªããã
https://cancer.sanger.ac.uk/cosmic/signatures/SBS/SBS4.tt
äžã®å³ã«ç€ºããã®ãSignature4ã§ããïŒçªã¯ã¿ãã³ã®æå®³ç©è³ªã®äžã€ããã³ãŸãã¬ã³ã®å€ç°ã·ã°ããã£ãŒã§ãã
ãããããã®Column(å)ã®ããŒããå€ç°ã®é »åºŠã瀺ããŠããŸããå·ŠåŽã®æ°Žè²ã®ããŒã¯CâAãžã®å€ç°ã®é »åºŠã瀺ããŠããŸããé»ã¯CâGã®å€ç°ãèµ€ã¯CâTã®å€ç°ãç°è²ã¯TâAã®å€ç°ãé»ç·è²ã¯TâCã®å€ç°ããã³ã¯ãTâGã®å€ç°ã瀺ããŠããŸããç¹å€ç°ãã¿ãŒã³ã6çš®é¡ãããªãããšã«æ°ãä»ãããŸããã§ãããããããšãã°GâTãžã®å€ç°ãšãããã¿ãŒã³ããããŸããããªãããšãããšãããã¯çžè£çãªå¡©åºããèŠãŠGâTå€ç°ã¯ãCâAå€ç°ãšåããã®ã§ããããã§ãããã£ãŠGâTã¯ãCâAãšåäžã®ãã®ãšããŠã«ãŠã³ãããŸãã
ãã®ïŒã€ã¯åãããš
A G>T C â
| | | |
T C>A G â
ãCâAã®éšåãæ¡å€§ããŸãã
5'æ«ç«¯åŽããA,C,G,Tã®4éãã3'æ«ç«¯åŽãA,C,G,Tã®4éãã§ãåèš16éãã®å€ç°ãã¿ãŒã³ããããŸããCCAãCAAã«ãªãé »åºŠãé«ããTCGãTAGã«ãªãé »åºŠã¯ããŸãé«ããªãããã§ããã
Signatureã®å
šäœåãææ¡ããããã«Cosmicã«æ²èŒãããŠãã30åã®ã·ã°ããã£ãŒïŒVersion2)ã瀺ããŸãã
ïŒææ°çVersion3ã¯ããŒã ããŒãžãåç
§ããŠãã ãããïŒ
ð ãç®æšã
ãããŠãä»åã¯TCGAã®çªç¶å€ç°ããŒã¿ã䜿ã£ãŠå¡©åºçœ®æã®è¡ååŒãã€ãããRumaleãšããæ©æ¢°åŠç¿ã©ã€ãã©ãªã䜿ã£ãŠNMF(éè² å€è¡åå ååè§£)ãå®è¡ããŸããããã§COSMICã®ã·ã°ããã£ãŒãäžéšã ãã§ãåçŸã§ãããæåãšããããšã«ããŸãã
ãçªç¶å€ç°ã®ããŒã¿ã¯cBioPortalã§äžè¬å ¬éãããŠããExonã®ãã®ã䜿çšããŸããTCGAã®å€ç°ã®ããŒã¿ã¯cBioPortalã®APIå©ãã ãã§ååŸã§ããŸããïŒãã ãWhole Genomeã§ã¯ãªãã®ã§ç²ŸåºŠã¯äœããªããã€ã¢ã¹ãå ¥ããšæããŸãïŒ
ð ãæ¹æ³ãªã©ã«ã€ããŠã
åå¥ã®æ¹æ³ã«ã€ããŠãéå»ã®Qiitaã®ãšã³ããªãŒãžã®ãªã³ã¯ãªã©è²Œã£ãŠãããŸãã
Rubyã§TCGAã®ããŒã¿ããŒã¹ cBioPortal ããæ å ±ãåãæ¹æ³
daru-apiclientãšããè¶
èãèªäœããŒã«ã§BioPortalããããŒã¿ãååŸããŠããŸãã
httpartyã§ããŒã¿ãååŸãDaruã«ããŒã¿ãæž¡ããŠããŸãããã詳ããæ
å ±ã¯äžèšã®ããŒãžãã芧ãã ããã
Rubyã§ããã®ã²ãã æ
å ±ãæ©æ¢°åŠç¿ããŠã¿ã
https://github.com/kojix2/daru-apiclient
NMFã«ã€ããŠ
æ°åŠãŸã£ããããããŸãããã¢ã«ãŽãªãºã ã®å
容ã¯ããããªãã§ãã
ä»ã®æ¹ã®æžããããããããèšäºããããã®èšäºãåç
§ããŠé°å²æ°ã§ãã£ãŠããŸãã
éè² å€è¡åå ååè§£ïŒNMFïŒããµãã£ãšçè§£ãã
Rubyã§NMFãè¡ãæ¹æ³
Rubyã§NMFãå®è¡ããæ¹æ³ã«ã€ããŠã¯ãã¡ãã®èšäºãåç
§ãã ããã
Rubyã§é¡ã®ããŒã¿ã»ãããNMF(éè² å€è¡åå ååè§£)ããŠã¿ã
Rubyã§T-SNEãè¡ãæ¹æ³
ãRubyãt-SNEã§MNISTãæ¬¡å å§çž®ããŠå¯èŠåããŠã¿ã
Rumaleãå®è¡ãããšãã¯numo-linalgãã€ã³ã¹ããŒã«ããã
Rumaleã§NMFãt-SNEãå®è¡ããå Žåã¯ãnumo-linalgã®å°å
¥ããªã¹ã¹ã¡ããŸããnumo-linalgãã€ã³ã¹ããŒã«ããªããŠãRumaleã¯åé¡ãªãå®è¡ã§ããã®ã§ãããå§åçãªèšç®æéã®å·®ãçºçããŸããç¹ã«ã³ã¢æ°ãå€ãCPUã§ã¯ãã®å·®ãé¡èã§ããnumo-linalgã¯æå€ãšã€ã³ã¹ããŒã«ãé£ããã©ã€ãã©ãªã§ãããœãŒã¹ã³ãŒãããOpenBLASãã€ã³ã¹ããŒã«ããŠãgem install -- --with-blas-dir
ãªãã·ã§ã³ãæå®ããã®ãæ£æ»æ³ã ãšæããŸãããããç°å¢ã«ãã£ãŠã¯OpenBLASãããã±ãŒãžããå
¥ããŠãåã«gem install
ããã ãã§ãã€ã³ã¹ããŒã«ã§ããããç¥ããŸããã
numo-linalgãmacã§åãããŠã¿ã
Rubyã§Plotlyãäœ¿ãæ¹æ³
è¶
ãã€ããŒãªæ¹æ³ã«ãªããŸãããiruby-plotlyã䜿çšããŠããŸãã
æ§ããã«èšã£ãŠããã¡ããã¡ã䟿å©ã§ããæµè¡ã£ãŠãªããŠã䟿å©ãªãã®ã¯äŸ¿å©ã
Rubyã§ãã€ãªãªã³ãããããäœããPlotlyã
https://github.com/zach-capalbo/iruby-plotly
ã²ãã ã®ä»»æã®äœçœ®ã®å¡©åºé åãååŸããæ¹æ³
@percipere ããŸã®äžèšã®èšäºãããšã«samtoolsã䜿çšããŸãã
ã²ãã ã®ä»»æã®äœçœ®ã®å¡©åºé
åãååŸããã
以äžããããŒã¿æºåç·šããšãã¯ã©ã¹ã¿ãªã³ã°ïŒå¯èŠåç·šãã®2ã€ã®Jupyter Notebookã«åããŠå®è¡ããŠããŸãã
ð ãããŒã¿æºåç·šã
ð ç®æ¬¡
æºå
- ð ã»ãŒããã€ã³ãçšé¢æ°ãæºåãã
- ð ã©ã€ãã©ãªãèªã¿èŸŒã
ããŒã¿ååŸç·š
- ð TCGAã«æ¥ç¶ããæºåããã
- ð¡ TCGAã®ã¹ã¿ãã£ã®IDãååŸãã
- ð¡ Mutationã®ããŒã¿ãååŸãã
ååŠçç·š
- ð§ Mutationã®ããŒã¿ãäžã€ã®Daruðºãã¡ã€ã«ã«ãŸãšãã
- ðŸ Mutationã®ããŒã¿ãä¿åãã
- ð§ SNP ã〠Chromosome ç¹å®ã§ããMutationãæœåºãã
- ðŸ SNPã®ããŒã¿ãä¿åãã
Samtoolsç·š
- â samtools ã®åäœç¢ºèªããã
- â äž¡é£ã®å¡©åºã確èªããåã«ãncbiBuildã®ããŒãžã§ã³ã37ã§ããããšã確èªããŠãã
- ð§ å·Šå³ã®å¡©åºãsamtoolsã§ç¹å®ãã
- ð mutations_snp_trinucleotide ã®ããŒã¿æ§é ã®ç¢ºèª
- ðŸ ããªãã¯ã¬ãªããã®ããŒã¿ãä¿åãã
å€ç°é »åºŠã«ã¿ãã°äœæç·š
- ð§ ã·ã°ããã£ãŒã®å€ç°ãã¿ãŒã³ãçæãã
- ð§ éçžè£çé åãæ±ãã颿°
- ð§ ã·ã°ããã£ãŒã®å€ç°ãã¿ãŒã³ïŒå察åŽïŒãçæãã
- ð§ ã€ã³ããã¯ã¹ã©ãã«ãçæãã
- ð§ å€ç°ã®é »åºŠãã«ãŠã³ãããŠã«ã¿ãã°ãäœã
- ð§ ã«ã¿ãã°ãNArrayã«å€æãã
- ðŸ catalogã®ããŒã¿ãä¿åãã
NMFå®è¡ç·š
- ð Rumaleã®NMFã®äœ¿ãæ¹ã確èªãã
- ð§ Rumaleã§NMFãå®è¡ã㊠ðŸ ã¢ãã«ãä¿åãã
0. ð ã»ãŒããã€ã³ãçšé¢æ°ãæºåãã
## ä»å㯠Signature ãã£ã¬ã¯ããªã«ãªããžã§ã¯ããä¿åããŠãã
require 'fileutils'
FileUtils.mkdir_p("Signature")
def sync(str, obj)
begin
if obj.nil?
Marshal.load(File.binread("Signature/#{str}.dat"))
else
File.binwrite("Signature/#{str}.dat", Marshal.dump(obj))
obj
end
rescue
Marshal.load(File.binread("Signature/#{str}.dat"))
end
end
1. ð ã©ã€ãã©ãªãèªã¿èŸŒã
require 'daru/apiclient' # TCGAã®ããŒã¿ãååŸããŸã
require 'numo/linalg' # OpenBLAS
require 'rumale' # æ©æ¢°åŠç¿
require 'parallel' # 䞊åèšç®
require 'iruby-plotly' # å¯èŠå
require 'awesome_print' # åºåããããã«
require 'rainbow' # åºåããããã«
# Paralell ã§äœ¿çšããã³ã¢æ°ã®èšå®
# Ryzen7ã®ãã·ã³ã§å®è¡ããã®ã§15ã§èšå®ããŸãã
PCORE = 15
15
2. ð TCGAã«æ¥ç¶ããæºåããã
c = Daru::APIClient.new "https://www.cbioportal.org/api"
<Daru::APIClient:0x00005588fd84ff40 @c=#<#:0x00005588fd84ee38>>
3. ð¡ TCGAã®ã¹ã¿ãã£ã®IDãååŸãã
study_ids = c.get("/molecular-profiles").filter_rows{|r| r["molecularAlterationType"] == "MUTATION_EXTENDED"}
.filter_rows{|r| r["molecularProfileId"].include? "tcga"}["molecularProfileId"]
.to_a.find_all{|i| i.include? "_tcga_mutations"}
.map{|i| i.gsub("_mutations","")}
["acc_tcga", "blca_tcga", "brca_tcga", "cesc_tcga", "chol_tcga", "coadread_tcga", "dlbc_tcga", "esca_tcga", "gbm_tcga", "hnsc_tcga", "kich_tcga", "kirc_tcga", "kirp_tcga", "laml_tcga", "lgg_tcga", "lihc_tcga", "luad_tcga", "lusc_tcga", "meso_tcga", "ov_tcga", "paad_tcga", "pcpg_tcga", "prad_tcga", "sarc_tcga", "skcm_tcga", "stad_tcga", "tgct_tcga", "thca_tcga", "thym_tcga", "ucec_tcga", "ucs_tcga", "uvm_tcga"]
4. ð¡ Mutationã®ããŒã¿ãååŸãã
ããã§ã¯ãTCGAã®ã¹ã¿ãã£ããšã«Mutationã®ããŒã¿ãååŸããŸãã
mutations = study_ids.map.with_index do |id, index|
puts [id, index]
df = c.get("/molecular-profiles/#{id}_mutations/mutations", query: {sampleListId: "#{id}_sequenced", projection: "DETAILED"})
df2 = df["gene","startPosition","endPosition","referenceAllele","variantAllele","sampleId", "variantType", "ncbiBuild", "studyId"]
end
0
5. ð§ Mutationã®ããŒã¿ãäžã€ã®Daruðºãã¡ã€ã«ã«ãŸãšãã
Daruã¯é床ãä¿¡é Œæ§ã®é¢ã§èª²é¡ãããã®ã§ãããåŠçã®æµãã®ãããããããèããŠDaru::DataFrameã«ãŸãšããŸããã
mutations2 = mutations.inject(&:concat)
# Daruã®æåãäžå¿ç¢ºèªããŠãã
if mutations.map(&:size).sum == mutations2.size
puts "OK"
mutations = mutations2
mutations2 = nil # ã»ããŒããšããã¡ã¢ãªå¯Ÿç
else
raise
end
# å
é ã®ããŒã¿ã確èªãã
mutations.head(5)
# ç·æ°
mutations.size
1118839
6. ðŸ Mutationã®ããŒã¿ãä¿åãã
mutations = sync("mutations", mutations)
7. ð§ SNP ã〠Chromosome ç¹å®ã§ããMutationãæœåºãã
# SNPã®ã¿ãéžã¶
mutations_snp = mutations.where(mutations.variantType.eq "SNP")
# Chromosome ç¹å®ã§ããMutationã®ã¿ãéžã¶
# (keep_row_if ã¡ãœããã¯é
ãã®ã§filter_rowsã䜿çš)
mutations_snp = mutations_snp.filter_rows{|r| r["gene"]["chromosome"] != nil}
# å
é ã®ããŒã¿ã確èªãã
mutations_snp.head(5)
8. ðŸ SNPã®ããŒã¿ãä¿åãã
mutations_snp = sync("mutations_snp", mutations_snp)
ð» mutations倿°ãã¯ãªã¢
mutations = nil
ð Samtoolã䜿ã£ãŠãå€ç°ã®äž¡é£ã®å¡©åºãç¹å®ãã
9. â samtools ã®åäœç¢ºèªããã
- @percipere ããŸã®èšäºãããšã«samtoolsãã»ããã¢ããããŠãã
- ã²ãã ã®ä»»æã®äœçœ®ã®å¡©åºé åãååŸããã
puts `samtools faidx Homo_sapiens.GRCh37.dna.primary_assembly.fa.gz 7:150696110-150696118`
7:150696110-150696118
ATCCCCCAG
10. â äž¡é£ã®å¡©åºã確èªããåã«ãncbiBuildã®ããŒãžã§ã³ã37ã§ããããšã確èªããŠãã
- GRCh37ã ãã§ãªããåãªã37ãšããã®ãå«ãŸããŠããã蚱容ç¯å²ã§ããããã
mutations_snp["ncbiBuild"].value_counts
11. ð§ å·Šå³ã®å¡©åºãsamtoolsã§ç¹å®ãã
# IRubyã§ã¯ Parallelã§äžŠåèšç®ã®æäžã«åºåãè¡ããšãšã©ãŒãçºçããã®ã§ logger ãæºåãã
require 'logger'
File.open("Signature/samtools.log",'w'){|file| file = nil}
logger = Logger.new("Signature/samtools.log")
start_time = Time.now
sample_ids = mutations_snp["sampleId"].to_a.uniq
mutations_snp_trinucleotide = Parallel.map(sample_ids, in_processes: PCORE, progress: "Samtools") do |sample_id|
# ããã®å¹çãããªããã
mutations_per_sample_df = mutations_snp.where(mutations_snp["sampleId"].eq sample_id)
# Parallelã§ãã«ãã³ã¢ãçãããèšç®ããããããRubyã®é
åã«å€æãã. æåŸã® to_h ã¯ãªããŠãåã
# ãã®å ŽåHashã§ã¯ãªããDaru::Vectorã«ãªã
mutations_per_sample = mutations_per_sample_df.each_row.to_a.map(&:to_h)
mutations_per_sample.map do |mutation|
chromosome = mutation["gene"]["chromosome"]
start_position = mutation["startPosition"]
end_position = mutation["endPosition"]
raise "start_position != end_position" if start_position != end_position
reference_allele = mutation["referenceAllele"]
variant_allele = mutation["variantAllele"]
trinucleotide = `samtools faidx Homo_sapiens.GRCh37.dna.primary_assembly.fa.gz \\
#{chromosome}:#{start_position-1}-#{end_position+1}`
.split("\n")[1]
# GRChã®æ€çŽ¢çµæãTCGAã®çµæãšäžèŽããŠããªãå Žåãã°ãæ®ã
# IRubyã§ã¯ Parallelã§äžŠåèšç®ã®æäžã«åºåãè¡ããšãšã©ãŒãçºçããã®ã§ logger ãå©çš
unless trinucleotide.nil?
if trinucleotide[1] != reference_allele
logger.info "#{mutation["studyId"]}ãtri[1]:#{trinucleotide[1]} != ra:#{reference_allele}"
end
end
[trinucleotide, reference_allele, variant_allele]
end
end
puts time: (Time.now - start_time)
Samtools: |====================================================================|
{:time=>1287.954323263}
12. ð mutations_snp_trinucleotide ã®ããŒã¿æ§é ã®ç¢ºèª
puts mutations_snp_trinucleotide.class
puts "%d samples" % mutations_snp_trinucleotide.size
puts mutations_snp_trinucleotide[0]
13. ðŸ ããªãã¯ã¬ãªããã®ããŒã¿ãä¿åãã
mutations_snp_trinucleotide = sync("mutations_snp_trinucleotide", mutations_snp_trinucleotide)
14. ð§ ã·ã°ããã£ãŒã®å€ç°ãã¿ãŒã³ãçæãã
substitution_class = [['C' , 'A'],
['C' , 'G'],
['C' , 'T'],
['T' , 'A'],
['T' , 'C'],
['T' , 'G']]
ACGT = %w(A C G T)
mutation_types = []
substitution_class.each do |u,v|
ACGT.each do |j|
ACGT.each do |n|
mutation_types << ["" << j << u << n, v]
end
end
end
mutation_types = mutation_types.flatten(0)
15. ð§ éçžè£çé åãæ±ãã颿°
def reverse_complement(str)
return nil if str == nil
complement = {"A" => "T", "T" => "A", "C" => "G", "G" => "C"}
str.reverse.chars.map do |s|
complement[s]
end.join
end
## åäœç¢ºèª
reverse_complement("AAACTG")
"CAGTTT"
16. ð§ ã·ã°ããã£ãŒã®å€ç°ãã¿ãŒã³ïŒå察åŽïŒãçæãã
mutation_types_reverse = mutation_types.map do |pattern|
pattern.map(&method(:reverse_complement))
end
17. ð§ ã€ã³ããã¯ã¹ã©ãã«ãçæãã
ããã¯ðã°ã©ãã®è»žã©ãã«ãªã©ã«äœ¿çšãããã®ã§ã
signature_index = mutation_types.map{|i| i.join " > "}
File.binwrite("Signature/signature_index.dat", Marshal.dump(signature_index))
18. ð§ å€ç°ã®é »åºŠãã«ãŠã³ãããŠã«ã¿ãã°ãäœã
- ã«ãŠã³ãã§ããã«ãšã©ãŒã«ãªã£ããã®ãèµ€ãæåã§è¡šç€ºãããŸã
mutations_snp_trinucleotide.sum(&:size)
catalog = mutations_snp_trinucleotide.map do |mutations_per_sample|
mutation_count = Array.new(96, 0)
mutations_per_sample.each do |mutation|
m = mutation.values_at(0,2)
index = mutation_types.index(m) || mutation_types_reverse.index(m)
if index.nil?
# åŠçã«å€±æãããã®ã衚瀺ãã
print m.inspect.red
next
end
mutation_count[index] += 1
end
mutation_count
end
0
19. ð§ NArrayã«å€æãã
catalog = Numo::DFloat[*catalog]
puts catalog.inspect
20. ðŸ catalogã®ããŒã¿ãä¿åãã
catalog = sync("catalog", catalog)
21. ð Rumaleã®NMFã®äœ¿ãæ¹ã確èªãã
ri Rumale::Decomposition::NMF.new
22. ð§ Rumaleã§NMFãå®è¡ã㊠ðŸ ã¢ãã«ãä¿åãã
- NMFã§ã¯å±æè§£ããæ±ãŸããªãã®ã§ãNMF500å詊è¡ããŠã¿ãããšã«ããŸãã
- ã³ã³ããŒãã³ãã®æ°ã¯30ãšããŸãã
- modelãä¿åããŸã
- ããã§ã¯ä»ã®ããŒã«ãä»ã®è§£æã®ããšãèããŠ500åå詊è¡ããŠããŸããããã®ããšå®ã¯100ååãã䜿ããŸããã
FileUtils.mkdir_p("Signature/nmf")
Parallel.each(1..500, in_processes: PCORE, progress: "NMF") do |seed|
decomposer = Rumale::Decomposition::NMF.new(n_components: 30, max_iter: 500, random_seed: seed)
decomposer.fit_transform(catalog)
File.binwrite("Signature/nmf/decomposer_#{seed}.dat", Marshal.dump(decomposer))
end
NMF: |=========================================================================|
1..500
ããã§ããŒã¿æºåç·šã¯ãããŸãã§ããã€ãã«ãã¯ã©ã¹ã¿ãªã³ã°ïŒå¯èŠåç·šãã«ãã€ããŸãã
ðš ãã¯ã©ã¹ã¿ãªã³ã°ïŒå¯èŠåç·šã
ð³ ç®æ¬¡
æºå
- ð ã»ãŒããã€ã³ãçšé¢æ°ãæºåãã
- ð ã©ã€ãã©ãªãèªã¿èŸŒã
ããŒã¿ããŒãã£ã³ã°
- ð ã·ã°ããã£ãŒã®ã€ã³ããã¯ã¹ã©ãã«ãèªã¿èŸŒã
- ð NArrayã®NMFçµæãèªã¿èŸŒã
å¯èŠåããŠã¿ã
- ð NMFçµæã®ããŒããããã衚瀺ãã
- ð§ æšæºåããŠã¿ã
- GaussianMixtureãšããã®ã§ã¯ã©ã¹ã¿ãªã³ã°ãã
- ð ããŒãããããæããªãã
t-SNEç·š
- ð Rumaleã®t-SNEã®äœ¿ãæ¹ã確èªãã
- ð§ t-SNEãå®è¡ãã
- ðŸ t-SNEã®çµæãä¿åãã
- ð t-sneã®çµæãå¯èŠåãã
- ð ã¯ã©ã¹ã¿ãªã³ã°ããµãããŠå¯èŠåãã
ã·ã°ããã£ãŒè¡šç€º
- ð ã·ã°ããã£ãŒåè£ã®å¯èŠå
ãã®ã»ã
- ðŸ CSVãã¡ã€ã«ãæžãåºã
1. ð ã»ãŒããã€ã³ãçšé¢æ°ãæºåãã
## ä»å㯠Signature ãã£ã¬ã¯ããªã«ãªããžã§ã¯ããä¿åããŠãã
require 'fileutils'
FileUtils.mkdir_p("Signature")
def sync(str, obj)
begin
if obj.nil?
Marshal.load(File.binread("Signature/#{str}.dat"))
else
File.binwrite("Signature/#{str}.dat", Marshal.dump(obj))
obj
end
rescue
Marshal.load(File.binread("Signature/#{str}.dat"))
end
end
2. ð ã©ã€ãã©ãªãèªã¿èŸŒã
require 'numo/narray' # è¡åèšç®
require 'numo/linalg' # OpenBLAS t-SNEé«éåã«éèŠ
require 'iruby-plotly' # å¯èŠå
require 'rumale' # æ©æ¢°åŠç¿
3. ð ã·ã°ããã£ãŒã®ã€ã³ããã¯ã¹ã©ãã«ãèªã¿èŸŒã
signature_index = Marshal.load(File.binread("Signature/signature_index.dat"))
4. ð NArrayã®NMFçµæãèªã¿èŸŒã
decomposer = (1..100).map do |i|
Marshal.load(File.binread("Signature/nmf/decomposer_#{i}.dat"))
end
components = Numo::DFloat.vstack(decomposer.map(&:components))
puts components.inspect
5. ð NMFçµæã®ããŒããããã衚瀺ãã
IRuby.plot(
{x: signature_index,
z: components.to_a,
type: "heatmap"},
{title: "TCGA ã·ã°ããã£ãŒåè£",
xaxis: {tickangle: 300,
tickfont: {size: 6}},
height: 600,
width: 1000})
6. ð§ æšæºåããŠã¿ã
normalizer = Rumale::Preprocessing::StandardScaler.new
samples_std = normalizer.fit_transform(components.transpose)
samples_std = samples_std.transpose
puts samples_std.inspect
puts samples_std.var(axis: 1).minmax
[0.9999999999999953, 1.0000000000000049]
7. ð§ GaussianMixtureãšããã®ã§ã¯ã©ã¹ã¿ãªã³ã°ãã
- KMeansã¯ãµã³ãã«ãçåããããšããã®ã§å®ããŒã¿ã§ã¯å€±æããŸãã
- è€æ°ã®ã·ã°ããã£ãŒãåããããåºçŸãããšããæ¡ä»¶ã§ã·ãã¥ã¬ãŒã·ã§ã³ãå®è¡ãããšKMeansã§ãããŸããããŸã
- ãã¡ãã®ã¢ã«ãŽãªãºã ã®å 容ãããããããŸããâŠ
gmm = Rumale::Clustering::GaussianMixture.new(n_clusters: 30, random_seed: 5)
pca = Rumale::Decomposition::PCA.new(n_components: 50)
samples_std_pca = pca.fit_transform(samples_std)
cluster_ids = gmm.fit_predict(samples_std_pca)
puts cluster_ids.inspect
Numo::Int32#shape=[3000]
[5, 9, 21, 29, 12, 7, 14, 24, 20, 27, 2, 11, 20, 25, 18, 29, 3, 18, 22, 7, ...]
8. ð ããŒãããããæããªãã
IRuby.plot(
{x: signature_index,
z: samples_std[cluster_ids.sort_index, true].to_a,
type: "heatmap"},
{title: "TCGA ã·ã°ããã£ãŒåè£ ã¯ã©ã¹ã¿ãŒïŒãœãŒãåŸ",
xaxis: {tickangle: 300,
tickfont: {size: 6}},
height: 600,
width: 1000}
)
9. ð Rumaleã®t-SNEã®äœ¿ãæ¹ã確èªãã
ri Rumale::Manifold::TSNE.new
â° t-SNEã¯æéããããã®ã§ãäœåãå®è¡ããæã«æéãèšé²ã«æ®ããããã«ããŠãã
times = []
10. ð§ t-SNEãå®è¡ãã
start_time = Time.now
tsne = Rumale::Manifold::TSNE.new(n_components: 2, perplexity: 40.0, max_iter: 500, random_seed: 1)
tsne_representations = tsne.fit_transform(samples_std_pca)
times << Time.now - start_time
11. ðŸ t-SNEã®çµæãä¿åãã
tsne_representations = sync("tsne_representations", tsne_representations)
0
12. ð t-sneã®çµæãå¯èŠåãã
x = tsne_representations[true, 0]
y = tsne_representations[true, 1]
IRuby.plot(x: x.to_a,
y: y.to_a,
mode: 'markers',
type: "scatter",
title: "ãNMFã§åŸãããã·ã°ããã£åè£ãt-SNEã§å¯èŠåã",
width: 800,
height: 600)
13. ð ã¯ã©ã¹ã¿ãªã³ã°ããµãããŠå¯èŠåãã
data = cluster_ids.to_a.uniq.sort.map do |l|
{
x: x[cluster_ids.eq(l)].to_a,
y: y[cluster_ids.eq(l)].to_a,
type: "scatter",
mode: 'markers',
name: "sig#{l}"
}
end
IRuby.plot(data, {title: "ãNMFã§åŸãããã·ã°ããã£åè£ãt-SNEã§å¯èŠåã", height: 600, width: 800})
14. ðš ã·ã°ããã£ãŒåè£ã®å¯èŠå
data = 30.times.map do |l|
s2 = components[cluster_ids.eq(l).where, true]
s = s2.mean(0)
color = %w(red orange green blue purple brown).map{|i| [i] * 16}.flatten
{y: s.to_a,
x: signature_index,
xaxis: "x#{l+1}",
yaxis: "y#{l+1}",
marker: {color: color},
type: "bar"}
end
layout = {
grid: {rows: 10, columns: 3,
pattern: 'independent'}}
(1..30).each do |i|
layout["xaxis#{i}".to_sym] = {showticklabels: false}
end
IRuby.plot(data, layout)
15. ðŸ CSVãã¡ã€ã«ãæžãåºã
- ä»åã¯Rubyã§é 匵ããŸããããð Orange ãªã©ã®ããŒã«ã䜿ããšGUIã§äŸ¿å©ã«è§£æããããšãã§ããŸãã
- ä»ã®ããŒã«ã§ãäœãã§ããããã«ãCSVãã¡ã€ã«ã§æžãåºããŸãã
require 'csv'
CSV.open("Signature/nmf/signatures.csv", "w") do |csv|
csv << signature_index
samples_std.to_a.each do |ar|
csv << ar
end
end
0
çµæ
ããããŸã§ã¹ã¯ããŒã«ããŠãããŠããããšãããããŸãã
ãä»åã¯ãGaussianMixtureãšããã¢ã«ãŽãªãºã ãã€ãã£ãŠã¯ã©ã¹ã¿ãªã³ã°ãè¡ããšããåçŽãªæ¹æ³ãæ¡çšããŠããŸããçµæã§ããã
Signature1, Signature2, Signature6, Signature7, Signature10, Signature13, Signature14, Signature17, Signature21, Signature28 ãããã¯æ€åºã§ããæ°ãããŸããSignature4 ã¯å°ãæªããã§ãããæ€åºã§ããããšã«ããŸãã
ãSig1ãSig10ãäœåãæ€åºãããŠããŸãããããã®éè€ããŠæ€åºãããSignatureã¯t-SNE ã§äŒŒããããªäœçœ®ã«ããããšãããããŸããåè£8ãåè£13ã¯ããµã³ãã«æ°ãå°ãªãã®ã§NMFã§èª€æ€åºãããå±æè§£ãããããŸããã
ãåè£12ãåè£21ã¯ãå°ãªããªãåæ°æ€åºãããŠããã«ããããããCosmicã«äŒŒããããªSignatureãç»é²ãããŠããªããã®ã§ããããããäœããã®ãã¹ãã¢ãŒããã¡ã¯ãã ãšæããŸãããããããããŸããããšã¯ãœã³ã®ã¿ã®å€ç°ã§æ¬åœã«çµæãåºããå¿é ã§ããããæã£ã以äžã«ããããã®ã·ã°ããã£ãŒãæ€åºã§ããŸããã
ãä»åããçšåºŠæ€åºã«æåããã·ã°ããã£ãŒ
COSMIC signature | é¢é£ããçºããèŠå | äž»ãªããçš® |
---|---|---|
Signature1 | 5-methylcytosineã®è±ã¡ãã«å, å 霢 | åºãèªãããã |
Signature2 | APOBECé µçŽ æŽ»æ§ | è ºãããäžå¿ã«åºãã¿ããã |
Signature4 | å«ç | èºããã»é é éšããã»èèãããªã© |
Signature6 | DNAãã¹ããã修埩ç°åžž | æªæ§ãªã³ãè «ãªã© |
Signature7 | 玫å€ç· | ã¡ã©ããŒã |
Signature10 | POLEé µçŽ ç°åžž | å€§è žããã»åå®®ãããªã© |
Signature13 | APOBECé µçŽ æŽ»æ§ | è ºãããäžå¿ã«åºãã¿ããã |
Signature14 | POLEå€ç°ãšãã¹ããã修埩ç°åžž | æªæ§ãªã³ãè «ãªã© |
Signature17 | äžæ | é£éããã»èãããªã© |
Signature21 | äžæ | èãããªã© |
Signature28 | äžæ | å€§è žããã»åå®®ãããªã© |
ããŸã ð ãOrangeã§ããå°ã詳ããçµæãã¿ãŠã¿ããã
ãOrangeã¯ã¹ãããã¢ã®ãªã¥ããªã£ã倧åŠãéçºããŠããGUIã§ããŒã¿è§£æãæ©æ¢°åŠç¿ãã§ããããŒã«ã§ããJupyterãçPythonãèŠããªããŠãã¯ãªãã¯æäœã§è§£æãã§ããŠããŸããšãããæ¬åœã«çŽ æŽãããããŒã«ã§ããNMFãæ¡åŒµæ©èœã䜿çšããã°å®è¡ã§ããããã§ãAPIãå©ããšãããå¥ã«ããã°ããã®ããŒãžã§ç޹ä»ããåŠçã¯Orangeã§ã¿ããªå®è¡ã§ããŠããŸãããã§ããããã§ã¯åŸç·šã®ãã¯ã©ã¹ã¿ãªã³ã°ïŒå¯èŠåç·šããOrangeã§ãã£ãŠã¿ãããšã«ããŸãã
ã¹ã¯ãªãŒã³ã·ã§ããã®ããã«ã¯ãŒã¯ãããŒãå®çŸ©ããŠå®è¡ããŸããã¯ãã䟿å©ããã§ãã (ToT) ðŠ
å³ã®t-SNEã®ãŠã£ã³ããŠã§èгå¯ããããµã³ãã«ãéžæãããšãèªåçã«äžã®ãŠã£ã³ããŠã§ã·ã°ããã£ãŒãæããŠããŸãã
仿ãããŠããã®ã¯Signature4ã ãšæãããŸãã
ããã«çްããèŠããšãããã®å°ããéå£ã¯Signature8ã«è¿ãã¿ããã ãšããããšãããããŸããã
å šäœçã«Signature10ã®åœ±é¿ãéåžžã«åŒ·ãã§ããSignature6ããSignature23ã¯ãããã«è¿ãåè£ãååšããããSignature10çªã匷çãããŠæ··ãã£ãŠããŸã£ãŠæ£ããæ€åºã§ããŠãªãã®ã§ã¯ïŒãšããå°è±¡ãåããŸããã
ããå°æ°ãªããSignature20çªã«è¿ããã¿ãŒã³ãæ€åºãããŠãããããŸããããããããã2åã4åã®åè£ã20çªã«ã¡ãã£ãšäŒŒãŠãã颚ã ã£ãã ããªã®ã§å¶ç¶ã«ãããªããããããŸããã
ãŸãåè£13çªã¯ãµã³ãã«æ°ãå°ãªããšæã£ãŠããŸããããåãªãéé³ãšããŠç¡èŠã§ããªãããåçŸæ§ãæã£ãŠæ€åºãããããã§ããããŒããããã¯äžäœäœã§ããããã©ããç§ã®ãã¹ãé ããŠããã®ã§ããããã
ãOrangeã¯ããŸã泚ç®ãããŠããŸããããªãã§ãããããããã¯è§£æAPIãæäŸããããšã§ããŒã¿ã»ã³ã¿ãŒã«æ
å ±ãéç©ãããšãã倧æITãã©ãããã©ãŒããŒã®äŒæ¥æŠç¥ãšåèŽããªããããããããŸãããããŒã«ã®è¯ãæªãã倿ããã®ã¯æ©æ¢°åŠç¿ãšã³ãžãã¢ã§ããããããä»å 䟡å€ãé«ããç«¶äºã«æ¥ã
æãããŠããã§ããããšã³ãžãã¢ããã¿ããšãOrangeã¯äžéåç«¯ãªæ©èœããæããªã䜿ããªãããŒã«ã«èŠãããããããŸããããããã¯éã«ããããŸã§ãšã³ãžãã¢ãè·äººæã§ã³ãŒããæžããŠå®çŸããŠããããã»ã¹ã®äžéšãããã¿ã³ã®ã¯ãªãã¯ãæ°åããã ãã§å®è¡ã§ããããã«ããŠãããšããæå³ã§ãæ©æ¢°åŠç¿è·äººã®äŸ¡å€ãæ¯æããããŒã«ã®ããã«ãèŠããŠãåçºããããªãå¿çãçãããããããŸããããå¿é
ãªããããããå¿çã¯ããšã³ãžãã¢ã®äžçã«éããã©ãã®ã§ãããããããšã§ãã
ããããããããããéãšã³ãžãã¢ãã®äººã
ã䜿ãããŒã¿è§£æããŒã«ã¯ãPythonãJuliaã§ã¯ãªããExcel+Orangeã®ãããªãã®ã«ãªãããããªãããªããšãå人çã«ã¯ãããªæ°ãã¡ãã£ãšããã®ã§ããã
ðš ããŸã ãæçµµã®äœãæ¹ã
ããŒããŒã¿ãªã«ãçšæããŸã 1
ã¶ã£ã¡ããã¡ãããšããç»çšçŽãããå Žåã¯ãã£ã¡ã®æ¹ãããã§ãã
ããŒã«ãã³ã§äžçµµãæžããŠãè²éçã§å¡ããŸãã
ã¹ãã£ããŒã§ç»åãåã蟌ã¿ãŸãã
âã¹ãã£ããŒã ãšã·ã¯ãåã蟌ãã§ããŸãã®ã§ãã«ã¡ã©ã®æ¹ããããã
Gimpã§ç»åãéããŠãã¬ãã«è£æ£ãªã©é©åœã«äžéããã£ãããlinear invert
ã㊠Hue-Chroma
ã§è²åãã調æŽããŸããããã§ã»ãšãã©å®æã§ããããšã¯ã奜ã¿ã§ symmetric nearest neighbor
ãªã©æ°ã«å
¥ã£ããã£ã«ã¿ãŒãããã€ã远å ããŸãã
äœã®ããŽããããããªïŒ
ãã®èšäºãæžãã«ããã£ãŠãäžè©±ã«ãªã£ãããŒã«ããµã€ãã®ããŽã§ãã
ã
ã
ã
ã
ã
ã
å·Šããé çªã«ãRuby, Rumale, Plotly, cBioPortal, COSMIC, Orange, GIMP ã§ããã
ãã®èšäºã¯ä»¥äžã§ãã
-
ã³ã³ãã©ãŒã ãµãŒãã¹ã¿ãªã«200ãhttps://pro.crecia.co.jp/product/search/index.php/item?id=11007 â©