LoginSignup
2
3

More than 5 years have passed since last update.

AWS EMRでLambdaからPySparkジョブを呼ぶ方法

Last updated at Posted at 2018-11-30

boto3でPySparkを呼び出すのにてこずった

  • command-runner.jar だけ渡してやればいいみたいだ
    import boto3
    emr = boto3.client('emr', region_name='ap-northeast-1')

    spark_args = ['spark-submit', 'pyspark_app.py']

    response = emr.add_job_flow_steps(
        # クラスタ作成後にもらえるJobFlowId
        JobFlowId='j-XXXXXXXXXXXXXX',
        Steps=[
            {
                'Name': 'sample batch job',
                'ActionOnFailure': 'CANCEL_AND_WAIT',
                'HadoopJarStep': {
                    'Properties': [],
                    'Jar': 'command-runner.jar',
                    'Args': spark_args
                }
            },
        ]
    )
2
3
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
2
3