More than 5 years have passed since last update.

AWS 認定ビッグデータ専門知識のサンプル問題の日本語訳

Last updated at 2019-10-22Posted at 2019-10-14

ちょっと受けてみようかと思ってるのでサンプル問題を見てみます。
ただ、まだ日本語のやつがないので日本語化してみます。

試験の概要

AWS 認定ビッグデータ – 専門知識認定は、少なくとも 2 年以上 AWS テクノロジーを使用した経験がある、複雑なビッグデータ分析に携わる個人を対象としています。
認定によって検証される能力
・基本アーキテクチャのベストプラクティスに従ってコアとなる AWS ビッグデータ関連サービスを実装できる
・ビッグデータを設計および管理できる
・ツールを活用してデータ分析を自動化できる

試験ガイドのダウンロード

なんかRedShift、EMR、Kinesisなんかがいっぱい出るみたいですね。

サンプル問題

なにはともあれサンプル問題(https://d1.awsstatic.com/training-and-certification/docs-bigdata-spec/BD-S%20Sample%20Questions%20for%20Web.pdf)。
10問あって全部解答付きです。
ちなみに・・日本語訳とか書いてますが、書いてる人の英語力はほぼありませんのであらかじめご了承下さい。
基本google翻訳に通した後に理解できそうな程度に補正しています。
気になる点とかがあればご指摘くださればなおします。

しかしgoogle翻訳、めっちゃ優秀なんやけど。9割方問題ないし、ほぼ通じる。

Q1

A company needs to deploy a data lake solution for their data scientists in which all company data is
accessible and stored in a central S3 bucket. The company segregates the data by business unit, using
specific prefixes. Scientists can only access the data from their own business unit. The company needs a
single sign-on identity and management solution based on Microsoft Active Directory (AD) to manage
access to the data in Amazon S3.
Which method meets these requirements?

A) Use AWS IAM Federation functions and specify the associated role based on the users' groups in AD.
B) Create bucket policies that only allow access to the authorized prefixes based on the users' group name
in Active Directory.
C) Deploy the AD Synchronization service to create AWS IAM users and groups based on AD information.
D) Use Amazon S3 API integration with AD to impersonate the users on access in a transparent manner.

answer

A - Identity Federation allows organizations to associate temporary credentials to users authenticated through an external identity provider such as Microsoft Active Directory (AD). These temporary credentials are linked to AWS IAM roles that grant access to the S3 bucket. Option B does not work because bucket policies are linked to IAM principles and cannot recognize AD attributes. Option C does not work because AD Synchronization will not sync directly with AWS IAM, and custom synchronization would not result in Amazon S3 being able to see group information. D isn't possible because there is no feature to integrate Amazon S3 directly with external identity providers.

企業は、すべての企業データにアクセスして中央のS3バケットに保存するデータサイエンティスト向けのデータレイクソリューションを展開する必要があります。会社は、ビジネスユニットごとにデータを分離します。
特定のプレフィックス。データサイエンティストは自分のビジネスユニットのデータにのみアクセスできます。会社では、Microsoft S3のデータへのアクセスを管理するために、Microsoft Active Directory（AD）に基づくシングルサインオンIDおよび管理ソリューションが必要です。
これらの要件を満たす方法はどれですか？

A）AWS IAM Federation機能を使用し、ADのユーザーのグループに基づいて関連するロールを指定します。
B）Active Directoryのユーザーのグループ名に基づいて、承認されたプレフィックスへのアクセスのみを許可します。
C）AD同期サービスをデプロイして、AD情報に基づいてAWS IAMユーザーとグループを作成します。
D）Amazon S3 APIとADの統合を使用して、透過的にユーザーのアクセスを偽装します。

answer

A - IDフェデレーションにより、組織は一時的な資格情報を認証されたユーザーに関連付けることができます Microsoft Active Directory（AD）などの外部IDプロバイダー。これらの一時的な認証情報はにリンクされています S3バケットへのアクセスを許可するAWS IAMロール。バケットポリシーがリンクされているため、オプションBは機能しません IAMの原則であり、AD属性を認識できません。 AD同期は機能しないため、オプションCは機能しません AWS IAMと直接同期します。カスタム同期では、Amazon S3はグループを表示できません情報。 Dは、Amazon S3を外部IDと直接統合する機能がないため、不可能です。プロバイダー。

Q2

An administrator has a 500-GB file in Amazon S3. The administrator runs a nightly COPY command into
a 10-node Amazon Redshift cluster. The administrator wants to prepare the data to optimize performance
of the COPY command.
How should the administrator prepare the data?

A) Compress the file using gz compression.
B) Split the file into 500 smaller files.
C) Convert the file format to AVRO.
D) Split the file into 10 files of equal size.

answer

B - The critical aspect of this question is running the COPY command with the maximum amount of parallelism. The two options that will increase parallelism are B and D. Option D will load one file per node in parallel, which will increase performance, but option B will have a greater effect because it will allow Amazon Redshift to load multiple files per instance in parallel (COPY can process [one file per slice on each node](http://docs.aws.amazon.com/redshift/latest/dg/c_high_level_system_architecture.html)). Compressing the files (option A) is a recommended practice and will also increase performance, but not to the same extent as loading multiple files in parallel.

管理者がAmazon S3に500GBのファイルを持っています。 管理者は、10ノードのAmazon Redshiftクラスターに対して夜間にCOPYコマンドを実行します。 管理者は、COPYコマンドのパフォーマンスを最適化するためにデータを準備したいと考えています。
管理者はどのようにデータを準備する必要がありますか？

A）gz圧縮を使用してファイルを圧縮します。
B）ファイルを500個の小さなファイルに分割します。
C）ファイル形式をAVROに変換します。
D）ファイルを同じサイズの10個のファイルに分割します。

answer

B - この質問の重要な側面は、最大限の並列処理でCOPYコマンドを実行することです。並列性を高める2つのオプションはBとDです。オプションDはノードごとに1つのファイルを並列にロードし、パフォーマンスを向上させますが、オプションBはAmazon Redshiftがインスタンスごとに複数のファイルを並列にロードできるため、より大きな効果があります（COPYは[各ノードのスライスごとに1ファイル]を処理できます（http://docs.aws.amazon.com/redshift/latest/dg/c_high_level_system_architecture.html））。ファイルを圧縮する（オプションA）は推奨される方法であり、パフォーマンスも向上しますが、複数のファイルを並行してロードするのと同じ程度にはなりません。

Q3

A customer needs to load a 550-GB data file into an Amazon Redshift cluster from Amazon S3, using
the COPY command. The input file has both known and unknown issues that will probably cause the load
process to fail. The customer needs the most efficient way to detect load errors without performing any
cleanup if the load process fails.
Which technique should the customer use?

A) Split the input file into 50-GB blocks and load them separately.
B) Use COPY with NOLOAD parameter.
C) Write a script to delete the data from the tables in case of errors.
D) Compress the input file before running COPY.

answer

B - From the AWS Documentation for NOLOAD: NOLOAD checks the integrity of all of the data without loading it into the database. The NOLOAD option displays any errors that would occur if you had attempted to load the data. All other options will require subsequent processing on the cluster which will consume resources.

顧客は、COPYコマンドを使用して、550GBのデータファイルをAmazon S3からAmazon Redshiftクラスターにロードする必要があります。 入力ファイルには、ロードプロセスが失敗する可能性がある既知の問題と不明な問題の両方があります。 ロードプロセスが失敗した場合、クリーンアップを実行せずにロードエラーを検出する最も効率的な方法が必要です。
お客様はどの手法を使用する必要がありますか？

A）入力ファイルを50GBブロックに分割し、個別にロードします。
B）NOLOADパラメーターを指定してCOPYを使用します。
C）エラーが発生した場合にテーブルからデータを削除するスクリプトを作成します。
D）COPYを実行する前に入力ファイルを圧縮します。

answer

B - NOLOADのAWSドキュメントから：NOLOADは、データベースにロードせずにすべてのデータの整合性をチェックします。 NOLOADオプションは、データをロードしようとした場合に発生するエラーを表示します。他のすべてのオプションでは、リソースを消費するクラスターでの後続の処理が必要になります。

Q4

An organization needs a data store to handle the following data types and access patterns:
- Key-value access pattern
- Complex SQL queries and transactions
- Consistent reads
- Fixed schema
Which data store should the organization choose?

A) Amazon S3
B) Amazon Kinesis
C) Amazon DynamoDB
D) Amazon RDS

answer

D - Amazon RDS handles all these requirements, and although Amazon RDS is not typically thought of as optimized for key-value based access, a schema with a good primary key selection can provide this functionality. Amazon S3 provides no fixed schema and does not have consistent read after PUT support. Amazon Kinesis supports streaming data that is consistent as of a given sequence number but doesn't provide key/value access. Finally, although Amazon DynamoDB provides key/value access and consistent reads, it does not support SQLbased queries.

組織には、次のデータ型とアクセスパターンを処理するデータストアが必要です。
- Key-Valueアクセスパターン
- 複雑なSQLクエリとトランザクション
- 一貫した読み取り
- 固定スキーマ
組織はどのデータストアを選択する必要がありますか？

A）Amazon S3
B）Amazon Kinesis
C）Amazon DynamoDB
D）Amazon RDS

answer

D - Amazon RDSはこれらのすべての要件を処理します。AmazonRDSは通常、キー値ベースのアクセス用に最適化されているとは考えられていませんが、適切なプライマリキー選択を持つスキーマがこの機能を提供できます。 Amazon S3は固定スキーマを提供せず、PUTサポート後の一貫した読み取りがありません。 Amazon Kinesisは、指定されたシーケンス番号の時点では一貫性がありますが、キー/値アクセスを提供しないストリーミングデータをサポートしています。最後に、Amazon DynamoDBはキー/値アクセスと一貫した読み取りを提供しますが、SQLベースのクエリをサポートしません。

Q5

A web application emits multiple types of events to Amazon Kinesis Streams for operational reporting. Critical events must be captured immediately before processing can continue, but informational events do not need to delay processing. 
What is the most appropriate solution to record these different types of events?

A) Log all events using the Kinesis Producer Library. 
B) Log critical events using the Kinesis Producer Library, and log informational events using the PutRecords API method. 
C) Log critical events using the PutRecords API method, and log informational events using the Kinesis Producer Library. 
D) Log all events using the PutRecords API method.

answer

C – The core of this question is how to send event messages to Kinesis synchronously vs. asynchronously. The critical events must be sent synchronously, and the informational events can be sent asynchronously. The Kinesis Producer Library (KPL) implements an asynchronous send function, so it can be used for the informational messages. PutRecords is a synchronous send function, so it must be used for the critical events.

ウェブアプリケーションは、運用レポートのためにAmazon Kinesis Streamsに複数のタイプのイベントを送信します。 処理を続行するには、重大なイベントをすぐにキャプチャする必要がありますが、情報イベントは処理を遅らせる必要はありません。
これらの異なる種類のイベントを記録するための最も適切なソリューションは何ですか？

A）Kinesis Producer Libraryを使用してすべてのイベントを記録します。
B）Kinesis Producer Libraryを使用して重要なイベントを記録し、PutRecords APIメソッドを使用して情報イベントを記録します。
C）PutRecords APIメソッドを使用して重要なイベントを記録し、Kinesis Producer Libraryを使用して情報イベントを記録します。
D）PutRecords APIメソッドを使用してすべてのイベントを記録します。

answer

C – この質問の核心は、イベントメッセージをKinesisに同期的または非同期的に送信する方法です。重要なイベントは同期的に送信する必要があり、情報イベントは非同期的に送信できます。 Kinesis Producer Library（KPL）は非同期送信機能を実装しているため、情報メッセージに使用できます。 PutRecordsは同期送信関数であるため、重要なイベントに使用する必要があります。

Q6

An administrator decides to use the Amazon Machine Learning service to classify social media posts that mention your company into two categories: posts that require a response and posts that do not. The training dataset of 10,000 posts contains the details of each post, including the timestamp, author, and full text of the post. You are missing the target labels that are required for training. 
Which two options will create valid target label data? 

A) Ask the social media handling team to review each post and provide the label. 
B) Use the sentiment analysis NLP library to determine whether a post requires a response. 
C) Use the Amazon Mechanical Turk web service to publish Human Intelligence Tasks that ask Turk workers to label the posts. 
D) Using the a priori probability distribution of the two classes, use Monte-Carlo simulation to generate the labels.

answer

A, C - You need accurate data to train the service and get accurate results from future data. The options described in B and D would end up training an ML model using the output from a different machine learning model and therefore would significantly increase the possible error rate. It is extremely important to have a very low error rate (if any!) in your training set, and therefore human-validated or assured labels are essential.

管理者は、Amazon Machine Learningサービスを使用して、会社に言及するソーシャルメディアの投稿を、応答が必要な投稿と不要な投稿の2つのカテゴリに分類することにしました。 10,000件の投稿のトレーニングデータセットには、投稿のタイムスタンプ、作成者、全文など、各投稿の詳細が含まれています。 トレーニングに必要なターゲットラベルがありません。
有効なターゲットラベルデータを作成する2つのオプションはどれですか？

A）ソーシャルメディア処理チームに各投稿を確認してラベルを提供するよう依頼してください。
B）センチメント分析NLPライブラリを使用して、投稿に応答が必要かどうかを判断します。
C）Amazon Mechanical Turk Webサービスを使用して、Turkワーカーに投稿のラベル付けを依頼するヒューマンインテリジェンスタスクを発行します。
D）2つのクラスの事前確率分布を使用して、モンテカルロシミュレーションを使用してラベルを生成します。

answer

A、C - サービスをトレーニングし、将来のデータから正確な結果を得るには、正確なデータが必要です。 BとDで説明したオプションは、異なる機械学習モデルからの出力を使用してMLモデルをトレーニングすることになり、そのため、可能なエラー率が大幅に増加します。トレーニングセットでエラー率が非常に低い（存在する場合）ことは非常に重要であるため、人間が検証または保証したラベルが不可欠です。

Q7

A mobile application collects data that must be stored in multiple Availability Zones within five minutes of being captured in the app. 
What architecture securely meets these requirements? 

A) The mobile app should write to an S3 bucket that allows anonymous PutObject calls. 
B) The mobile app should authenticate with an Amazon Cognito identity that is authorized to write to an Amazon Kinesis Firehose with an Amazon S3 destination. 
C) The mobile app should authenticate with an embedded IAM access key that is authorized to write to an Amazon Kinesis Firehose with an Amazon S3 destination. 
D) The mobile app should call a REST-based service that stores data on Amazon EBS. Deploy the service on multiple EC2 instances across two Availability Zones.

answer

B – It is essential when writing mobile applications that you consider the security of both how the application authenticates and how it stores credentials. Option A uses an anonymous Put, which may allow other apps to write counterfeit data; Option B is the right answer, because using Amazon Cognito gives you the ability to securely authenticate pools of users on any type of device at scale. Option C would put credentials directly into the application, which is strongly discouraged because applications can be decompiled which can compromise the keys. Option D does not meet our availability requirements: although the EC2 instances are running in different Availability Zones, the EBS volumes attached to each instance only store data in a single Availability Zone.

モバイルアプリケーションは、アプリにキャプチャされてから5分以内に複数のアベイラビリティーゾーンに保存する必要があるデータを収集します。
どのアーキテクチャがこれらの要件を安全に満たしていますか？

A）モバイルアプリは、匿名のPutObject呼び出しを許可するS3バケットに書き込む必要があります。
B）モバイルアプリは、Amazon S3宛先でAmazon Kinesis Firehoseへの書き込みが許可されているAmazon Cognito IDで認証する必要があります。
C）モバイルアプリは、Amazon S3宛先を使用してAmazon Kinesis Firehoseへの書き込みを許可された埋め込みIAMアクセスキーで認証する必要があります。
D）モバイルアプリは、Amazon EBSにデータを保存するRESTベースのサービスを呼び出す必要があります。 2つのアベイラビリティーゾーンにまたがる複数のEC2インスタンスにサービスをデプロイします。

answer

B – モバイルアプリケーションを作成する場合、アプリケーションの認証方法と資格情報の保存方法の両方のセキュリティを考慮することが不可欠です。オプションAは匿名のPutを使用します。これにより、他のアプリが偽造データを書き込むことができます。オプションBは正解です。AmazonCognitoを使用すると、あらゆる種類のデバイス上のユーザーのプールを大規模に安全に認証できるためです。オプションCは、資格情報をアプリケーションに直接入力しますが、アプリケーションを逆コンパイルしてキーを侵害する可能性があるため、推奨されません。オプションDは可用性要件を満たしていません。EC2インスタンスは異なるアベイラビリティーゾーンで実行されていますが、各インスタンスに接続されたEBSボリュームは単一のアベイラビリティーゾーンにのみデータを保存します。

Q8

A data engineer needs to collect data from multiple Amazon Redshift clusters within a business and consolidate the data into a single central data warehouse. Data must be encrypted at all times while at rest or in flight. 
What is the most scalable way to build this data collection process? 

A) Run an ETL process that connects to the source clusters using SSL to issue a SELECT query for new data, and then write to the target data warehouse using an INSERT command over another SSL secured connection. 
B) Use AWS KMS data key to run an UNLOAD ENCRYPTED command that stores the data in an unencrypted S3 bucket; run a COPY command to move the data into the target cluster. 
C) Run an UNLOAD command that stores the data in an S3 bucket encrypted with an AWS KMS data key; run a COPY command to move the data into the target cluster. 
D) Connect to the source cluster over an SSL client connection, and write data records to Amazon Kinesis Firehose to load into your target data warehouse.

answer

B - The most scalable solutions are the UNLOAD/COPY solutions because they will work in parallel, which eliminates A and D as answers. Option C is incorrect because the data would not be encrypted in flight, and you cannot encrypt an entire bucket with a KMS key. Option B meets the encryption requirements, the UNLOAD ENCRYPTED command automatically stores the data encrypted using-client side encryption and uses HTTPS to encrypt the data during the transfer to S3.

データエンジニアは、ビジネス内の複数のAmazon Redshiftクラスターからデータを収集し、そのデータを単一の中央データウェアハウスに統合する必要があります。 データは、保存中も転送中も常に暗号化する必要があります。
このデータ収集プロセスを構築する最もスケーラブルな方法は何ですか？

A）SSLを使用してソースクラスターに接続するETLプロセスを実行して新しいデータのSELECTクエリを発行し、別のSSLセキュア接続でINSERTコマンドを使用してターゲットデータウェアハウスに書き込みます。
B）AWS KMSデータキーを使用して、暗号化されていないS3バケットにデータを保存するUNLOAD ENCRYPTEDコマンドを実行します。 COPYコマンドを実行して、データをターゲットクラスターに移動します。
C）AWS KMSデータキーで暗号化されたS3バケットにデータを保存するUNLOADコマンドを実行します。 COPYコマンドを実行して、データをターゲットクラスターに移動します。
D）SSLクライアント接続を介してソースクラスターに接続し、Amazon Kinesis Firehoseにデータレコードを書き込んでターゲットデータウェアハウスにロードします。

answer

B - 最もスケーラブルなソリューションは、アンロード/コピーソリューションです。これらは並行して機能し、AおよびDを解答として排除するためです。データは転送中に暗号化されず、バケット全体をKMSキーで暗号化できないため、オプションCは正しくありません。オプションBは暗号化要件を満たし、UNLOAD ENCRYPTEDコマンドはクライアント側の暗号化を使用して暗号化されたデータを自動的に保存し、S3への転送中にHTTPSを使用してデータを暗号化します。

Q9

A company logs data from its application in large files and runs regular analytics of these logs to support internal reporting for three months after the logs are generated. After three months, the logs are infrequently accessed for up to a year. The company also has a regulatory control requirement to store application logs for seven years. 
Which course of action should the company take to achieve these requirements in the most cost-efficient way? 

A) Store the files in S3 Glacier with a Deny Delete vault lock policy for archives less than seven years old and a vault access policy that restricts read access to the analytics IAM group and write access to the log writer service role. 
B) Store the files in S3 Standard with a lifecycle policy to transition the storage class to Standard - IA after three months. After a year, transition the files to Glacier and add a Deny Delete vault lock policy for archives less than seven years old. 
C) Store the files in S3 Standard with lifecycle policies to transition the storage class to Standard – IA after three months and delete them after a year. Simultaneously store the files in Amazon Glacier with a Deny Delete vault lock policy for archives less than seven years old. 
D) Store the files in S3 Standard with a lifecycle policy to remove them after a year. Simultaneously store the files in Amazon S3 Glacier with a Deny Delete vault lock policy for archives less than seven years old.

answer

C – There are two aspects to this question: setting up a lifecycle policy to ensure that objects are stored in the most cost-effective storage, and ensuring that the regulatory control is met. The lifecycle policy will store the objects on S3 Standard during the three months of active use, and then move the objects to S3 Standard – IA when access will be infrequent. That narrows the possible answer set to B and C. The Deny Delete vault lock policy will ensure that the regulatory policy is met, but that policy must be applied over the entire lifecycle of the object, not just after it is moved to Glacier after the first year. Option C has the Deny Delete vault lock applied over the entire lifecycle of the object and is the right answer.

企業は、アプリケーションからのデータを大きなファイルに記録し、これらのログの定期的な分析を実行して、ログが生成されてから3か月間の内部レポートをサポートします。 3か月後、ログにアクセスされる頻度は最大1年です。同社には、アプリケーションログを7年間保存する規制管理要件もあります。
最も費用対効果の高い方法でこれらの要件を達成するために、会社はどのような行動を取るべきでしょうか？

A）S3 Glacierにファイルを保存します。7年未満のアーカイブに[削除拒否]のボールトロックポリシーを追加し、分析IAMグループへの読み取りアクセスとログライターサービスロールへの書き込みアクセスを制限するボールトアクセスポリシーを使用します。
B）3か月後にストレージクラスを標準-IAに移行するライフサイクルポリシーを使用して、S3標準にファイルを保存します。 1年後、ファイルをGlacierに移行し、7年未満のアーカイブに[削除拒否]のボールトロックポリシーを追加します。
C）3か月後にストレージクラスをStandard – IAに移行し、1年後に削除するために、ライフサイクルポリシーを使用してS3 Standardにファイルを保存します。 7年未満のアーカイブの拒否削除ボールトロックポリシーを使用して、Amazon Glacierにファイルを同時に保存します。
D）ファイルをライフサイクルポリシーとともにS3 Standardに保存し、1年後に削除します。 7年未満のアーカイブ用の拒否削除ボールトロックポリシーを使用して、ファイルをAmazon S3 Glacierに同時に保存します。

answer

C – この質問には2つの側面があります。ライフサイクルポリシーを設定して、オブジェクトが最も費用効果の高いストレージに保存されるようにすることと、規制管理が満たされるようにすることです。ライフサイクルポリシーは、3か月のアクティブな使用中にオブジェクトをS3 Standardに保存し、アクセスがまれになるとオブジェクトをS3 Standard – IAに移動します。可能な回答セットをBとCに絞り込みます。DenyDelete vault lockポリシーは、規制ポリシーが満たされていることを保証しますが、そのポリシーは、オブジェクトがGlacierに移動された直後ではなく、オブジェクトのライフサイクル全体に適用される必要があります1年目。オプションCには、オブジェクトのライフサイクル全体にわたって[削除拒否]のボールトロックが適用されており、これが正解です。

Q10

A data engineer needs to architect a data warehouse for an online retail company to store historic purchases. The data engineer needs to use Amazon Redshift. To comply with PCI:DSS and meet corporate data protection standards, the data engineer must ensure that data is encrypted at rest and that the keys are managed by a corporate on-premises HSM. 
Which approach meets these requirements in the most cost-effective manner? 

A) Create a VPC, and then establish a VPN connection between the VPC and the on-premises network. Launch the Amazon Redshift cluster in the VPC, and configure it to use your corporate HSM. 
B) Use the AWS CloudHSM service to establish a trust relationship between the CloudHSM and the corporate HSM over a Direct Connect connection. Configure Amazon Redshift to use the CloudHSM device. 
C) Configure the AWS Key Management Service to point to the corporate HSM device, and then launch the Amazon Redshift cluster with the KMS managing the encryption keys. 
D) Use AWS Import/Export to import the corporate HSM device into the AWS Region where the Amazon Redshift cluster will launch, and configure Redshift to use the imported HSM.

answer

A - Amazon Redshift can use an on-premises HSM for key management over the VPN, which ensures that the encryption keys are locally managed. Option B is possible: CloudHSM can cluster to an on-premises HSM. But then key management could be performed on either the on-premises HSM or CloudHSM, and that doesn’t meet the design goal. Option C does not describe a valid feature of KMS and violates the requirement for the corporate HSM to manage the keys requirement, even if it were possible. Option D is not possible because you cannot put hardware into an AWS Region.

データエンジニアは、オンライン小売企業が購入履歴を保存するためのデータウェアハウスを設計する必要があります。データエンジニアはAmazon Redshiftを使用する必要があります。 PCI：DSSに準拠し、企業のデータ保護基準を満たすために、データエンジニアは、データが保管時に暗号化され、キーが企業のオンプレミスHSMによって管理されることを確認する必要があります。
どのアプローチが最も費用対効果の高い方法でこれらの要件を満たしていますか？

A）VPCを作成し、VPCとオンプレミスネットワークの間にVPN接続を確立します。 VPCでAmazon Redshiftクラスターを起動し、企業HSMを使用するように設定します。
B）AWS CloudHSMサービスを使用して、直接接続接続を介してCloudHSMと企業のオンプレミスHSMの間に信頼関係を確立します。 CloudHSMデバイスを使用するようにAmazon Redshiftを構成します。
C）AWS Key Management Serviceが企業HSMデバイスを指すように設定し、暗号化キーを管理するKMSでAmazon Redshiftクラスターを起動します。
D）AWS Import / Exportを使用して、企業のHSMデバイスをAmazon Redshiftクラスターが起動するAWSリージョンにインポートし、インポートしたHSMを使用するようにRedshiftを設定します。

answer

A - Amazon Redshiftは、VPN経由のキー管理にオンプレミスHSMを使用できます。これにより、暗号化キーがローカルで管理されます。オプションBは可能です。CloudHSMはオンプレミスHSMにクラスター化できます。ただし、その後、オンプレミスHSMまたはCloudHSMのいずれかでキー管理を実行できますが、それは設計目標を達成しません。オプションCは、KMSの有効な機能を説明しておらず、たとえ可能であっても、企業HSMがキー要件を管理するための要件に違反しています。ハードウェアをAWSリージョンに配置できないため、オプションDは使用できません。

とりあえず、ここまでで。後日、自分で解いてみてから調整します。
しかしAmazon Mechanical Turk Webサービスだとか、センチメント分析NLPライブラリ、モンテカルロシミュレーションとかわからん。合ってんのか？
あというほど専門知識じゃない気がする。これまでと共通項多いかな。とりあえず受けてみるか。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

AWS 認定 ビッグデータ専門知識 のサンプル問題の日本語訳

試験の概要

サンプル問題

Q1

Q2

Q3

Q4

Q5

Q6

Q7

Q8

Q9

Q10

AWS 認定ビッグデータ専門知識のサンプル問題の日本語訳