Help us understand the problem. What is going on with this article?

AWS Summit - AWS Glue, AWS Lake Formation で実現するServerless Analystic

More than 1 year has passed since last update.

AWS Glue概要

  • マネジメントサーバレスETLサービス
  • 開発者、データサイエンティスト向けのサービス
  • 35+ 機能
  • データのカタログ化
    • Auto Glowing
    • Apache Hive Metastore互換
    • 分析サービスとの統合
  • サーバレスエンジン
    • Apache Spark
    • Python shell
    • Bach job
    • インテラクティグ?
  • Auto Scalation
    • Schedule

データディスカバリー

  • Performance
    • 1日9000万

サイエンス

  • Apache Spark
    • provision、管理不要
    • Auto Scaling
    • オンデマンド
  • Apache Spark Core: RDD
  • Data Frame
    • SparkSQL core data
    • SQLのような分析に適合
  • Dynamic Frame
    • Recored schema every data, 前列のスキーマ不要
    • 単一パスで多数のフローを実施する
    • Glue Parquet Writer
    • 標準Parquet Writer
    • Glue Parquet Writer
    • Performance
    • 構成, 10DPU, Apache Spark 2.
    • WorkLoad
      • JSON -> Parquet
    • DynamicFrame 78s
    • DataFrame 195s

AWS Glue実行モデル

  • Driver -> Multiple Executor
  • 連続敵なLogging
  • Remove Apache Spark log message filter out
  • Progressbar
  • Job Metric
    • base Apache Spark metrics
    • driver egze
      • 30s summary
      • real time cloudwatch
  • Memory monitoring
    • DataFrame many small file task, too more task, too many memory used
    • DynamicFrame auto group task by small file
    • worker types
      • default
      • G.1x
      • G.2x
  • Python shell
    • SQL base anaylice
    • middle size ML
    • Python 2.7 / 3.6 supported
      • boto3, awscli, numpy, scipy, pandas,... installed
    • spinup: under 20s
    • netword address supported.
    • size: 1DPU, 1/16 DPU
  • Python shell filtering
    • cost : 0.6$

auto scalition

  • event base
    • lambda
  • schedual event
  • entity
    • glue
    • job
    • trigger
  • event
    • schedule
    • event
    • extenal
  • control
    • ...
    • workflow feature
  • authoring DAG
  • workflow rerun
  • moniting
    • Updates
  • network
    • Reverse DNS support
    • VPC endpoint support for Glue
  • Job, trigger -> Resouce tagging
  • notifications

    • AWS Lake Formation
  • secure construction and manage

  • Sample of steps required

    • Find Resouces
    • Create S3 locations
    • Configure access policies
    • Map tables to Amazon S3 locations
    • ETL jobs
    • Create metadata access policies
    • Configure access from analytics services
    • Rinse and repeat for other
    • Manaul | Error | ??
  • Collecting and クレンジング

  • dataをセキュアに保管

  • Security

    • SQL style grant revoke permissions
    • EMR-Spark, Athena, Redshift, Glue
  • Collection

    • ML transforms for fuzzy record matching
    • Blueprints: Cloudtrail / ALB
  • Data discovery

  • Rap of Glue

  • コンプラインア

    • HIPAA BAA
    • ISO
    • PCI
    • ???
wwalpha
【AWS | GCP | Azure】Solution Architect | DevOps Engineer | React Specialist
Why not register and get more from Qiita?
  1. We will deliver articles that match you
    By following users and tags, you can catch up information on technical fields that you are interested in as a whole
  2. you can read useful information later efficiently
    By "stocking" the articles you like, you can search right away