Purpose
개발 테스트를 위해 ubuntu에 hadoop을 설치한다.
Prerequisites
ubuntu 12.04
$ sudo apt-get install openjdk-7-jdk
$ sudo apt-get install ssh
hadoop 구동을 위해서 자바 설치
hadoop 서버간 통신을 위해서 ssh 키를 공유한다.
hosts 파일에 서버간 host 설정
Oracle JDK Installation
$ sudo add-apt-repository ppa:webupd8team/java
$ sudo apt-get update
$ sudo apt-get install oracle-java[6-8]-installer
$ java -version
java version "1.7.0_55"
Java(TM) SE Runtime Environment (build 1.7.0_55-b13)
Java HotSpot(TM) 64-Bit Server VM (build 24.55-b03, mixed mode
Configuring
hadoop download
download
http://www.apache.org/dyn/closer.cgi/hadoop/common/
사용자 계정으로 설정한다.
$ wget {download url}
$ tar zxvf hadoop-2.4.0.tar.gz
hadoop 설정
etc/hadoop/core-site.xml, etc/hadoop/hdfs-site.xml, etc/hadoop/yarn-site.xml etc/hadoop/mapred-site.xml
설정 파일들은 ${HADOOP_HOME}/etc/hadoop 디렉토리에 존재한다.
etc/hadoop/hadoop-env.sh
java, hadoop 환경 설정
# JAVA HOME
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
# HADOOP HOME
export HADOOP_HOME=/home/{account}/haddop
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true"
export HADOOP_OPTS="$HADOOP_OPTS -Djava.library.path=$HADOOP_HOME/lib"
# 64bit java 사용을 위해서 아래 설정 추가
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="$HADOOP_OPTS -Djava.library.path=$HADOOP_HOME/lib"
However, the following error will still result with the use of 64-bit JDK, which has been reported not to be a big issue.
WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
To resolve the following error, one need to recompile Hadoop for 64-bit JDK.
etc/hadoop/core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://<NameNode Hostname>:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/{account}/tmp/hadoop</value>
</property>
</configuration>
etc/hadoop/hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
<property>
<name>dfs.blocksize</name>
<value>268435456</value>
</property>
</configuration>
etc/hadoop/mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
<description>The runtime framework for executing MapReduce jobs. Can be one of local, classic or yarn.</description>
</property>
<property>
<name>mapreduce.map.memory.mb</name>
<value>1536</value>
</property>
<property>
<name>mapreduce.map.java.opts</name>
<value>-Xmx1024M</value>
</property>
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>-Xmx2560M</value>
</property>
<property>
<name>mapreduce.task.io.sort.mb</name>
<value>512</value>
</property>
<property>
<name>mapreduce.task.io.sort.factor</name>
<value>100</value>
</property>
<property>
<name>mapreduce.reduce.shuffle.parallelcopies</name>
<value>50</value>
</property>
</configuration>
etc/hadoop/yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
<description>Shuffle service that needs to be set for MapReduce to run.</description>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value><Hostname>:8025</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value><Hostname>:8030</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value><Hostname>:8035</value>
</property>
</configuration>
etc/hadoop/slaves
<node hostname1>
<node hostname2>
<node hostname3>
Startup
start
${HADOOP_HOME}/sbin/start-dfs.sh
${HADOOP_HOME}/sbin/start-yarn.sh
shutdown
${HADOOP_HOME}/sbin/stop-dfs.sh
${HADOOP_HOME}/sbin/stop-yarn.sh
참고
http://hadoop.apache.org/docs/r2.4.0/hadoop-project-dist/hadoop-common/ClusterSetup.html
http://www.alexjf.net/blog/distributed-systems/hadoop-yarn-installation-definitive-guide
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/