Tag Archives: Hadoop

File could only be replicated to 0 nodes instead of minReplication (=1)

Working on newly configured cluster. Can browse HDFS but any write attempt produces an error:

Also I can  create and view tables in Hive, but any insert attempt fails with the same error. Continue reading

Accessing AWS s3 from on premises Hadoop

Add aws libraries to class path

hadoop-aws-*.jar library is not in classpath by default, but it exists in $HADOOP_HOME/tools/lib folder. To fix it, edit $HADOOP_HOME/etc/hadoop/hadoop-env.sh and add the following line:

Continue reading

Install Hadoop on Ubuntu

The best article of Hadoop installation I found is http://codesfusion.blogspot.ru/2013/10/setup-hadoop-2x-220-on-ubuntu.html 

I adopted it a little for my configuration: Ubuntu 14.04, Hadoop 2.2.0 single node, JDK 6. Not latest versions of Java and Hadoop but I had to reproduce existing production system.  

My steps are:

Install prerequisites

JDK 6 and ssh:

$ sudo apt-get install oracle-java6-installer
$ sudo apt-get install python-software-properties
$ sudo add-apt-repository ppa:webupd8team/java
$ sudo  apt-get update
$ sudo apt-get install oracle-java6-installer
$ sudo apt-get install ssh

Continue reading