본문 바로가기
dev/hadoop

Hadoop 2.4.0 Installation on Untuntu Fully-Distributed

by igooo 2014. 4. 28.
728x90

Purpose

개발 테스트를 위해 ubuntu에 hadoop을 설치한다.



Prerequisites

ubuntu 12.04

$ sudo apt-get install openjdk-7-jdk

$ sudo apt-get install ssh


hadoop 구동을 위해서 자바 설치

hadoop 서버간 통신을 위해서 ssh 키를 공유한다.

hosts 파일에 서버간 host 설정



Oracle JDK Installation

$ sudo add-apt-repository ppa:webupd8team/java

$ sudo apt-get update

$ sudo apt-get install oracle-java[6-8]-installer

$ java -version

java version "1.7.0_55"
Java(TM) SE Runtime Environment (build 1.7.0_55-b13)
Java HotSpot(TM) 64-Bit Server VM (build 24.55-b03, mixed mode




Configuring

hadoop download

download

http://www.apache.org/dyn/closer.cgi/hadoop/common/


사용자 계정으로 설정한다.

$ wget {download url}

$ tar zxvf hadoop-2.4.0.tar.gz


hadoop 설정

etc/hadoop/core-site.xml, etc/hadoop/hdfs-site.xml, etc/hadoop/yarn-site.xml etc/hadoop/mapred-site.xml

설정 파일들은 ${HADOOP_HOME}/etc/hadoop 디렉토리에 존재한다.


etc/hadoop/hadoop-env.sh

java, hadoop 환경 설정

# JAVA HOME

export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64

# HADOOP HOME

export HADOOP_HOME=/home/{account}/haddop

export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native

export HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true"

export HADOOP_OPTS="$HADOOP_OPTS -Djava.library.path=$HADOOP_HOME/lib"


# 64bit java 사용을 위해서 아래 설정 추가

export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native

export HADOOP_OPTS="$HADOOP_OPTS -Djava.library.path=$HADOOP_HOME/lib"


However, the following error will still result with the use of 64-bit JDK, which has been reported not to be a big issue.

WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

To resolve the following error, one need to recompile Hadoop for 64-bit JDK.


etc/hadoop/core-site.xml

<configuration>
        <property>
                <name>fs.defaultFS</name>
                <value>hdfs://<NameNode Hostname>:9000</value>
        </property>
        <property>
                <name>hadoop.tmp.dir</name>
                <value>/home/{account}/tmp/hadoop</value>
        </property>
</configuration>


etc/hadoop/hdfs-site.xml

<configuration>
        <property>
                <name>dfs.replication</name>
                <value>3</value>
        </property>
        <property>
                <name>dfs.permissions</name>
                <value>false</value>
        </property>
        <property>
                <name>dfs.blocksize</name>
                <value>268435456</value>
        </property>
</configuration>


etc/hadoop/mapred-site.xml

<configuration>
        <property>
                <name>mapreduce.framework.name</name>
                <value>yarn</value>
                <description>The runtime framework for executing MapReduce jobs. Can be one of local, classic or yarn.</description>
        </property>
        <property>
                <name>mapreduce.map.memory.mb</name>
                <value>1536</value>
        </property>
        <property>
                <name>mapreduce.map.java.opts</name>
                <value>-Xmx1024M</value>
        </property>
        <property>
                <name>mapreduce.reduce.memory.mb</name>
                <value>-Xmx2560M</value>
        </property>
        <property>
                <name>mapreduce.task.io.sort.mb</name>
                <value>512</value>
        </property>
        <property>
                <name>mapreduce.task.io.sort.factor</name>
                <value>100</value>
        </property>
        <property>
                <name>mapreduce.reduce.shuffle.parallelcopies</name>
                <value>50</value>
        </property>
</configuration>


etc/hadoop/yarn-site.xml

<configuration>

<!-- Site specific YARN configuration properties -->

        <property>
                <name>yarn.nodemanager.aux-services</name>
                <value>mapreduce_shuffle</value>
                <description>Shuffle service that needs to be set for MapReduce to run.</description>
        </property>
        <property>
                <name>yarn.resourcemanager.resource-tracker.address</name>
                <value><Hostname>:8025</value>
        </property>
        <property>
                <name>yarn.resourcemanager.scheduler.address</name>
                <value><Hostname>:8030</value>
        </property>
        <property>
                <name>yarn.resourcemanager.address</name>
                <value><Hostname>:8035</value>
        </property>
</configuration>


etc/hadoop/slaves

<node hostname1>

<node hostname2>

<node hostname3>



Startup

start

${HADOOP_HOME}/sbin/start-dfs.sh

${HADOOP_HOME}/sbin/start-yarn.sh


shutdown

${HADOOP_HOME}/sbin/stop-dfs.sh

${HADOOP_HOME}/sbin/stop-yarn.sh



참고

http://hidka.tistory.com/220

http://hadoop.apache.org/docs/r2.4.0/hadoop-project-dist/hadoop-common/ClusterSetup.html

http://www.alexjf.net/blog/distributed-systems/hadoop-yarn-installation-definitive-guide

http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/