環境
- Windows 10 Pro(ホストOS)
- VMware Workstation 16 Player 16.2.3
- Ubuntu 22.04 LTS(ゲストOS)
- Hadoop 3.3.3
- openjdk version 11.0.15
- VMware Workstation 16 Player 16.2.3
検証
1.下記コマンドを実行
hadoop jar share/hadoop/tools/lib/hadoop-streaming-3.3.3.jar -files /var/tmp/mapper.py,/var/tmp/reducer.py -mapper /var/tmp/mapper.py -reducer /var/tmp/reducer.py -input /user/hadoop/input.txt -output /user/hadoop/pythonOutput
/var/tmp/mapper.py
#!/usr/bin/env python
import sys
for l in sys.stdin:
for word in l.strip().split(): print('{0}\t1'.format(word))
/var/tmp/reducer.py
#!/usr/bin/env python
from collections import defaultdict
from operator import itemgetter
import sys
wordcount_dict = defaultdict(int)
for line in sys.stdin:
word, count = line.strip().split('\t')
wordcount_dict[word] += int(count)
for word, count in sorted(wordcount_dict.items(), key=itemgetter(0)):
print('{0}\t{1}'.format(word, count))
/user/hadoop/input.txtの中身
hadoop fs -cat /user/hadoop/input.txt
a b b c c c
2.以下のエラーが発生
・
・
・
2022-06-09 00:17:52,964 INFO mapreduce.Job: Task Id : attempt_1654688981443_0018_m_000000_0, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 127
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:326)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:539)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:466)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:350)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:178)
at java.base/java.security.AccessController.doPrivileged(Native Method)
at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:172)
・
・
・
3.コマンドを修正し実行
hadoop fs -rm -r /user/hadoop/pythonOutput
hadoop jar share/hadoop/tools/lib/hadoop-streaming-3.3.3.jar -files /var/tmp/mapper.py,/var/tmp/reducer.py -mapper "python3 /var/tmp/mapper.py" -reducer "python3 /var/tmp/reducer.py" -input /user/hadoop/input.txt -output /user/hadoop/pythonOutput
4.結果確認
hadoop fs -cat /user/hadoop/pythonOutput/*
a 1
b 2
c 3