UCSC Tables allows MySQL access.
Host: genome-mysql.cse.ucsc.edu
Port: 3306
User: genomep
Pass: password
- However, for database name, you need to use genome assembly name (e.g. hg38). Usually retriving assembly name is hard work, especially for specific case.
- For example, Gorilla gorGor5 has only refSeq genes but gorGor3 has only Ensembl genes. If you want only Ensembl genes data, this can be problematic.
- For this purpose, you might want to get the table for
[clade, organism, assembly]
triplets. - You can use the following script. But never use it too frequently.
ucsc_dbs.rb
# !/usr/bin/ruby
require 'mechanize'
UCSC='http://genome.ucsc.edu'
agent=Mechanize.new
agent.get(UCSC+"/cgi-bin/hgTables")
hgsid=agent.page.form_with(action:'/cgi-bin/hgTables').field_with(name:'hgsid').value
clades=agent.page.form_with(action:'/cgi-bin/hgTables').field_with(name:'clade').options.map(&:value)
clades.each{|clade|
agent.get(UCSC+"/cgi-bin/hgTables?hgsid=#{hgsid}&clade=#{clade}")
organisms=agent.page.form_with(action:'/cgi-bin/hgTables').field_with(name:'org').options.map(&:value)
organisms.each{|organism|
agent.get(UCSC+"/cgi-bin/hgTables?hgsid=#{hgsid}&clade=#{clade}&org=#{organism}")
dbs=agent.page.form_with(action:'/cgi-bin/hgTables').field_with(name:'db').options.map(&:value)
dbs.each{|db|
#agent.get(UCSC+"/cgi-bin/hgTables?hgsid=#{hgsid}&clade=#{clade}&org=#{organism}&db=#{db}&hgta_group=genes")
puts [clade,organism,db].inspect+','
}
}
}