Install Hadoop on Ubuntu

The best article of Hadoop installation I found is http://codesfusion.blogspot.ru/2013/10/setup-hadoop-2x-220-on-ubuntu.html 

I adopted it a little for my configuration: Ubuntu 14.04, Hadoop 2.2.0 single node, JDK 6. Not latest versions of Java and Hadoop but I had to reproduce existing production system.  

My steps are:

Install prerequisites

JDK 6 and ssh:

$ sudo apt-get install oracle-java6-installer
$ sudo apt-get install python-software-properties
$ sudo add-apt-repository ppa:webupd8team/java
$ sudo  apt-get update
$ sudo apt-get install oracle-java6-installer
$ sudo apt-get install ssh

Create user

$ sudo addgroup hadoop
$ sudo adduser --ingroup hadoop hadoop

Set up ssh

$ su - hduser
$ ssh-keygen -t rsa -P ""
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
$ exit

Download and unpack Hadoop

Download Hadoop 2.2.0 release http://archive.apache.org/dist/hadoop/core/hadoop-2.2.0/hadoop-2.2.0.tar.gz
and unpack to installation folder. I prefer /opt/hadoop

Set folder owner

$ sudo chown -R hduser:hadoop /opt/hadoop

Set environment variables

Set up environment variables in .bashrc

Edit ~.bashrc file
$ sudo nano /home/hduser/.bashrc

Paster environment variables to the end

#Hadoop variables 
export JAVA_HOME=/usr/lib/jvm/java-6-oracle/
export HADOOP_INSTALL=/opt/hadoop
export PATH=$PATH:$HADOOP_INSTALL/bin
export PATH=$PATH:$HADOOP_INSTALL/sbin
export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_HOME=$HADOOP_INSTALL
export HADOOP_HDFS_HOME=$HADOOP_INSTALL
export YARN_HOME=$HADOOP_INSTALL

 

After above steps the command /opt/hadoop/bin/hadoop version  should display the version

Configure hadoop

Go to /opt/hadoop/etc/hadoop folder and edit following files 
hadoop-env.sh
Add JAVA_HOME to the end of the file
#modify JAVA_HOME
export JAVA_HOME=/usr/lib/jvm/java-6-oracle/

core-site.xml

Paste following between <configuration>
<property>
   <name>fs.default.name</name>
   <value>hdfs://localhost:9000</value>
</property>

 

yarn-site.xml
Paste following between <configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>

  

Rename mapred-site.xml.template to mapred-site.xml then edit mapred-site.xml
#Paste following between <configuration>
<property>
   <name>mapreduce.framework.name</name>
   <value>yarn</value>
</property>
Execute following commands 
$ su hduser
$ cd ~
$ mkdir -p mydata/hdfs/namenode
$ mkdir -p mydata/hdfs/datanode
$ cd /opt/hadoop/etc/hadoop

 

hdfs-site.xml

Paste following between <configuration> tag
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/hduser/mydata/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/hduser/mydata/hdfs/datanode</value>
</property>

 

Format namenode

Format namenode with command
$ /opt/hadoop/bin/hdfs namenode -format

Start Hadoop

$ start-dfs.sh
...

$ start-yarn.sh
...

$ jps

If everything is successful, you should see following services running

DataNode
ResourceManager
Jps
NodeManager
NameNode
SecondaryNameNode

Check installation

Run example

$ /opt/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar pi 2

And check it’s progress in browser http://localhost:8088/cluster/apps

Open Hadoop web interface 

Paste in browser: http://localhost:8088

One thought on “Install Hadoop on Ubuntu

  1. Pingback: Install Apache Spark on Ubuntu | Dmitry Pukhov

Leave a Reply

Your email address will not be published.