본문 바로가기
데이터베이스

오랜만에 하둡 설치기

by 흰색남자 2022. 12. 19.

빅데이터 취업을 위해 오랜만에 하둡을 다시 설치해 보았다.

 

환경

우분투 20.04

하둡 3.3.1

 

master : namenode

slave-1 : secondary namenode, datanode 1

slave-2 : datanode 2

 

sudo apt install -y net-tools ssh vim tree openjdk-8-jdk // 자주 사용하는것 설치

wget https://archive.apache.org/dist/hadoop/common/hadoop-3.3.1/hadoop-3.3.1.tar.gz

tar -xzf hadoop-3.3.1.tar.gz

 

vi .bashrc

export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export HADOOP_HOME=/home/hadoop/hadoop
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:
export HADOOP_STREAMING=$HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming-3.3.1.jar
source ~/.bashrc

환경 변수 세팅 후 저장

 

/etc/hosts 설정

127.0.0.1       localhost
127.0.1.1       kjh-VirtualBox

192.168.56.101 master
192.168.56.102 slave-1
192.168.56.103 slave-2

# The following lines are desirable for IPv6 capable hosts
::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

 

 

 

============================================================================================

$HADOOP_HOME/etc/hadoop/yarn-site.xml     설정

<configuration>
  <property>
    <name>yarn.resourcemanager.hostname</name>
    <value>master</value>
  </property>

# log 출력
  <property>
    <name>yarn.log-aggregation-enables</name>
    <value>true</value>
  </property>

  <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
  </property>
  <property>
    <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
    <value>org.apache.hadoop.mapred.ShuffleHandler</value>
  </property>
</configuration>

============================================================================================

 

$HADOOP_HOME/etc/hadoop/core-site.xml     설정

<configuration>
 <property>
        <name>fs.defaultFS</name>
    <value>hdfs://master:9000</value>
 </property>
</configuration>

 

============================================================================================

$HADOOP_HOME/etc/hadoop/mapred-site.xml     설정

<configuration>
<property>
        <name>mapreduce.framework.name</name>
    <value>yarn</value>
</property>
<property>
        <name>yarn.app.mapreduce.am.env</name>
        <value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value>
</property>
<property>
        <name>mapreduce.map.env</name>
        <value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value>
</property>
<property>
        <name>mapreduce.reduce.env</name>
        <value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value>
</property>
</configuration>

============================================================================================

 

$HADOOP_HOME/etc/hadoop/hdfs-site.xml     설정

<configuration>
        <property>
    <name>dfs.namenode.http-address</name>
    <value>0.0.0.0:9870</value>
</property>

 

<property>
        <name>dfs.namenode.name.dir</name>
    <value>/home/kjh/hadoop-3.3.1/data/namenode</value>
</property>

<property>
        <name>dfs.datanode.data.dir</name>
    <value>/home/kjh/hadoop-3.3.1/data/datanode</value>
</property>

<property>
        <name>dfs.replication</name>
    <value>1</value>
</property>

<property>
        <name>dfs.namenode.secondary.http-address</name>
    <value>slave-1:50090</value>
</property>
</configuration>

 

============================================================================================

$HADOOP_HOME/etc/hadoop/workers     설정

slave-1
slave-2

 

 

 

slave 복제 후 /etc/hosts 기반으로 ip 세팅

 

 

master에서만 실행

cd $HADOOP_HOME/sbin

 

하둡 실행

./start-all.sh

 

하둡 종료

./stop-all.sh

 

 

예제 준비

 bin/hdfs dfs -mkdir /user

 bin/hdfs dfs -mkdir /user/workspace

 bin/hdfs dfs -mkdir /user/workspace/wordcount

 bin/hdfs dfs -put etc/hadoop/*.xml /user/workspace/wordcount

 

예제 돌리기

 bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.1.jar grep /user/workspace/wordcount /user/worksapce/wordcount_result 'dfs[a-z.]+'

 

 

결과 확인

hdfs dfs -cat  /user/worksapce/wordcount_result2/*