1
1

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 5 years have passed since last update.

DataprocのHadoopでdistcpがエラーになる対処法

Last updated at Posted at 2015-11-12

DataprocのHadoopでS3からGCSへdistcpしようとすると、
VerifyErrorになったり、YARNのVirtual Memory使用量チェックに引っかかったりして失敗する。

これらのエラーを回避するには以下のスクリプトをinitialization-actionsに指定する。

#!/bin/bash

echo "Remove bigquery-connector-*.jar because VerifyError occurrs when discp is executed."
rm /usr/lib/hadoop/lib/bigquery-connector-0.7.2-hadoop2.jar

echo "Add property to yarn-site.xml."
perl -i -pe 's#</configuration>#
  <property>
    <name>yarn.nodemanager.vmem-check-enabled</name>
    <value>false</value>
  </property>
</configuration>#m' /etc/hadoop/conf/yarn-site.xml

echo "Restart hadoop-yarn-nodemanager"
ROLE=$(/usr/share/google/get_metadata_value attributes/role)
if [[ "${ROLE}" == "Master" ]]; then
  # pass
  echo "pass"
else
  pid=`cat /var/run/hadoop-yarn/yarn-yarn-nodemanager.pid`
  echo "kill $pid"
  kill $pid
fi

こんなかんじ。

% gcloud beta dataproc clusters create <CLUSTER NAME> --zone <ZONE> --initialization-actions gs://<BUCKET>/<PATH TO FILE>

これでS3からGCSへのdistcpができる!

% hadoop distcp 's3n://<SOURCE BUCKET>/...' 'gs://<DESTINATION BUCKET>/...'

以上

1
1
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
1
1

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?