The best article of Hadoop installation I found is http://codesfusion.blogspot.ru/2013/10/setup-hadoop-2x-220-on-ubuntu.html
I adopted it a little for my configuration: Ubuntu 14.04, Hadoop 2.2.0 single node, JDK 6. Not latest versions of Java and Hadoop but I had to reproduce existing production system.
My steps are:
Install prerequisites
JDK 6 and ssh:
$ sudo apt-get install oracle-java6-installer
$ sudo apt-get install python-software-properties
$ sudo add-apt-repository ppa:webupd8team/java
$ sudo apt-get update
$ sudo apt-get install oracle-java6-installer
$ sudo apt-get install ssh
Create user
$ sudo addgroup hadoop
$ sudo adduser --ingroup hadoop hadoop
Set up ssh
$ su - hduser
$ ssh-keygen -t rsa -P ""
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
$ exit
Download and unpack Hadoop
Download Hadoop 2.2.0 release http://archive.apache.org/dist/hadoop/core/hadoop-2.2.0/hadoop-2.2.0.tar.gz
and unpack to installation folder. I prefer /opt/hadoop
Set folder owner
$ sudo chown -R hduser:hadoop /opt/hadoop
Set environment variables
Set up environment variables in .bashrc
Edit ~.bashrc file
$ sudo nano /home/hduser/.bashrc
Paster environment variables to the end
export HADOOP_INSTALL=/opt/hadoop
export PATH=$PATH:$HADOOP_INSTALL/bin
export PATH=$PATH:$HADOOP_INSTALL/sbin
export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_HOME=$HADOOP_INSTALL
export HADOOP_HDFS_HOME=$HADOOP_INSTALL
export YARN_HOME=$HADOOP_INSTALL
After above steps the command /opt/hadoop/bin/hadoop version should display the version
Configure hadoop
core-site.xml
</property>
Paste following between <configuration>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
#Paste following between <configuration>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
$ su hduser
$ cd ~
$ mkdir -p mydata/hdfs/namenode
$ mkdir -p mydata/hdfs/datanode
$ cd /opt/hadoop/etc/hadoop
hdfs-site.xml
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/hduser/mydata/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/hduser/mydata/hdfs/datanode</value>
</property>
Format namenode
Format namenode with command
$ /opt/hadoop/bin/hdfs namenode -format
Start Hadoop
$ start-dfs.sh
...
$ start-yarn.sh
...
$ jps
If everything is successful, you should see following services running
DataNode
ResourceManager
Jps
NodeManager
NameNode
SecondaryNameNode
Check installation
Run example
$ /opt/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar pi 2
And check it’s progress in browser http://localhost:8088/cluster/apps
Open Hadoop web interface
Paste in browser: http://localhost:8088
Pingback: Install Apache Spark on Ubuntu | Dmitry Pukhov