More than 5 years have passed since last update.

QCon2014: FacebookにおけるHdoop 利用の実態

Last updated at 2014-05-02Posted at 2014-05-01

QCon Tokyo 2014
QCon2014 My Report Top

Detail Report#4/8

FacebookにおけるHdoop 利用の実態

Joydeep Sen Sarma: QUBOLE India

FacebookでどのようにHadoopが使われ、なにがうまく使えた要因だったか、どこに課題があったかをわかりやすく説明してくれました。あと自分で立ち上げたQuboleの利点を説明してくれました。これから本格的な運用段階にはいる私には非常に参考にすべき内容が多い発表でした。

Back Ground

2007 Facebook Hadoop 3years
Mentor / PM Hadoop fair scheduler
Hadoop / Hive ( ETL Dev)
Qubole: founder

Hadoop FB

Use Case

Reporting(SQL)

complex ETL
Simple summaries (counters)

Index Genration( Map-Reduce/Java)
Data mining (C++)
Adhoc Queries( SQL)
Data Driven Applications(PYMK "People You May Know" )

friend suggestion,, (need offline batch)

Success Factors

Success Factors : Hadoop

Scales Linearly, Centrally Managed
Operationally Simple (15 people -25 PB)
Can do almost anything,, (not best)

Success Factors : Hive

SQL is easy (add scripts for map - reduce)
Browser based interface

(この写真は、Quboleですが Facebookでもweb interfaceを使用して、各デベロッパーのSQLが容易にみえるようになっていたようです。 )
（新しくFBに入ってきたやるきまんまんの人ほど巨大なクエリーを書きがちだそうです。）

Success Factors: Scribe

Log data using scribe from any application
simple to add attributes to user page views
json encoded logging allows schema evolution

Scribe is a server for aggregating streaming log data. It is designed to scale to a very large number of nodes and be robust to network and node failures. There is a scribe server running on every node in the system, configured to aggregate messages and send them to a central scribe server (or servers) in larger groups. If the central scribe server isn’t available the local scribe server writes the messages to a file on local disk and sends them when the central server recovers.

Issues

Issues: Hadoop scheduling

Have you no hadoop etiquettes? (2007)
reducer count capped in responce)

(一杯使いすぎるとみんなに影響)
(Every new enginner はそのような大量のマシンを必要とするジョブを書きがち)
(スケールさせるのがみんなハッピー)

Issue#2 Hive latency

Query take at least 30s
Hive is too slow for BI tools like Tableau
Difficult to write/test complrex SQL applications
Not possible to do drill downs into detail data

Issues#3 Real time Metrics

Nightly or Hourly aggregations are not enough
need real-time
driven by advertiser requirements
HDFS not great to store long-tail summaries
(like month and month metrics)

Issues #4 Cluster too big

Single HDFS runs into space/power/network limits
Hard to do queries across data centers

Issues #5 not fit for all applications

In Memory grid better for graph applications

Solutions

Hadoop Scheduling isolation

Hadoop fair scheduler
- fix preemption and speculation
separate clusters for production pipelines
- platinum, gold , silber
Prevent momory over sonsuption
- slabes, jobtracker,NameNode
  (それぞれの役割用のリソースに制限をかける)

Hadoop Scheduling - Corona

overlaps with apache YARN
Separate Cluster resource allocation from Map reduce task scheduling
Push Tasks to Slabes Reduced Latency and Idle Time
High Concurrent Dresign Optimistic Locking, Fast and Scales to thousands of nodes

Fast Queries - Prest

Pipe-lined execution model
in memory
cannot plug in customer code
(もしMapReduce ではすべてのMap taskが終わるまでreduceを待たせるということがあるが、Prestはパイプラインで処理をしていくので待つ必要がなくつくれるらしい)

Presto vs Hive

Other FB Projects

Calligraphus (Scribe ++)

Puma

Real-time processing
Hbase for Storage

Apache Giraph

In-Memory graph computations

Scalling issues Cloud!

まえにでてきたIssueは, Cloud サービスを使うと解決できますよ。（やっぱりそうだよね、、）
Cue を使ってMap, ReduceのNode数を増減させる
scale アップはeasy だが、ScaleDown は　Critical data が追加したNodeにあるかもしれないので難しい！

AWS S3 is much easier to use than HDFS

also: GCE and Azure Storage
Use HDFS as Cache (Qubole columnar cache)

Qubole appeal time!

Hive latency - Qubole

Small files
(5MB-数百MB までが経験的にはよいとのこと)
Prefetching S3 files
Direcct writes to S3
Multi-tenant hive server
JVM reuse
Columnar cache
(Json を parse してから、S3にためる等)

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up