QCon Tokyo 2014
QCon2014 My Report Top
Detail Report#4/8
#FacebookにおけるHdoop 利用の実態
Joydeep Sen Sarma: QUBOLE India
FacebookでどのようにHadoopが使われ、なにがうまく使えた要因だったか、どこに課題があったかをわかりやすく説明してくれました。あと自分で立ち上げたQuboleの利点を説明してくれました。これから本格的な運用段階にはいる私には非常に参考にすべき内容が多い発表でした。
Back Ground
- 2007 Facebook Hadoop 3years
- Mentor / PM Hadoop fair scheduler
- Hadoop / Hive ( ETL Dev)
- Qubole: founder
Hadoop FB
Use Case
- Reporting(SQL)
- complex ETL
- Simple summaries (counters)
- Index Genration( Map-Reduce/Java)
- Data mining (C++)
- Adhoc Queries( SQL)
- Data Driven Applications(PYMK "People You May Know" )
- friend suggestion,, (need offline batch)
Success Factors
Success Factors : Hadoop
- Scales Linearly, Centrally Managed
- Operationally Simple (15 people -25 PB)
- Can do almost anything,, (not best)
Success Factors : Hive
(この写真は、Quboleですが Facebookでもweb interfaceを使用して、各デベロッパーのSQLが容易にみえるようになっていたようです。 )
(新しくFBに入ってきたやるきまんまんの人ほど巨大なクエリーを書きがちだそうです。)
Success Factors: Scribe
- Log data using scribe from any application
- simple to add attributes to user page views
- json encoded logging allows schema evolution
Scribe is a server for aggregating streaming log data. It is designed to scale to a very large number of nodes and be robust to network and node failures. There is a scribe server running on every node in the system, configured to aggregate messages and send them to a central scribe server (or servers) in larger groups. If the central scribe server isn’t available the local scribe server writes the messages to a file on local disk and sends them when the central server recovers.
Issues
Issues: Hadoop scheduling
- Have you no hadoop etiquettes? (2007)
reducer count capped in responce)
(一杯使いすぎるとみんなに影響)
(Every new enginner はそのような大量のマシンを必要とするジョブを書きがち)
(スケールさせるのがみんなハッピー)
Issue#2 Hive latency
- Query take at least 30s
- Hive is too slow for BI tools like Tableau
- Difficult to write/test complrex SQL applications
- Not possible to do drill downs into detail data
Issues#3 Real time Metrics
- Nightly or Hourly aggregations are not enough
- need real-time
- driven by advertiser requirements
- HDFS not great to store long-tail summaries
(like month and month metrics)
Issues #4 Cluster too big
- Single HDFS runs into space/power/network limits
- Hard to do queries across data centers
Issues #5 not fit for all applications
- In Memory grid better for graph applications
Solutions
Hadoop Scheduling isolation
-
Hadoop fair scheduler
- fix preemption and speculation
- separate clusters for production pipelines
- platinum, gold , silber
- Prevent momory over sonsuption
- slabes, jobtracker,NameNode
(それぞれの役割用のリソースに制限をかける)
- slabes, jobtracker,NameNode
Hadoop Scheduling - Corona
- overlaps with apache YARN
- Separate Cluster resource allocation from Map reduce task scheduling
- Push Tasks to Slabes Reduced Latency and Idle Time
- High Concurrent Dresign Optimistic Locking, Fast and Scales to thousands of nodes
Fast Queries - Prest
- Pipe-lined execution model
- in memory
- cannot plug in customer code
(もしMapReduce ではすべてのMap taskが終わるまでreduceを待たせるということがあるが、Prestはパイプラインで処理をしていくので待つ必要がなくつくれるらしい)
Presto vs Hive
Other FB Projects
Calligraphus (Scribe ++)
Puma
- Real-time processing
- Hbase for Storage
Apache Giraph
- In-Memory graph computations
Scalling issues Cloud!
まえにでてきたIssueは, Cloud サービスを使うと解決できますよ。(やっぱりそうだよね、、)
Cue を使ってMap, ReduceのNode数を増減させる
scale アップはeasy だが、ScaleDown は Critical data が追加したNodeにあるかもしれないので難しい!
AWS S3 is much easier to use than HDFS
- also: GCE and Azure Storage
- Use HDFS as Cache (Qubole columnar cache)
Qubole appeal time!
Hive latency - Qubole
-
Small files
(5MB-数百MB までが経験的にはよいとのこと) -
Prefetching S3 files
-
Direcct writes to S3
-
Multi-tenant hive server
-
JVM reuse
-
Columnar cache
(Json を parse してから、S3にためる等)