Category Archives: Cloud computing

Accessing AWS s3 from on premises Hadoop

Add aws libraries to class path

hadoop-aws-*.jar library is not in classpath by default, but it exists in $HADOOP_HOME/tools/lib folder. To fix it, edit $HADOOP_HOME/etc/hadoop/ and add the following line:

Continue reading

Install Zookeeper on Linux

Zookeeper installation steps are nice and easy for dev environment.

  1. Download zookeeper from, extract it to some place, let it be /opt/zookeeper/
  2. Create a simple zoo.cfg file, i.e. copy config sample /opt/zookeeper/conf/zoo_sample.cfg to /opt/zookeeper/conf/zoo.cfg
  3. Start zookeeper
    /opt/zookeeper/bin/ start
  4. Stop zookeeper
    /opt/zookeeper/bin/ stop

HBase shell on AWS EMR cluster quickstart

How to start HBase client in AWS EMR and query external HBase DB

  1. Create EMR cluster with HBase application enabled manually or using command like this:

2. Establish ssh connection to the cluster

3.To work with external database, set zookeeper quorum in /etc/hbase/conf/hbase-site.xml

3. Start HBase shell

4. In shell do queries like that:


Clear Apache Storm cluster remotely

My bash scripts to clear Storm and Zookeeper cluster remotely by ssh.

Main idea:

Connect to every zookeeper server by ssh and stop zookeeper, then delete data folder. Then connect to every Storm node by ssh, kill Storm processes and delete data folder. Connect to zookeeper servers again, start them. Connect to Storm nodes again, start them. Continue reading

Apache Storm supervisor is not listed in cluster

Symptoms: when trying to run a Storm supervisor node it is  not listed in cluster. We can see the following error in log:

Error when processing event java.lang.RuntimeException: at backtype.storm.utils.Utils.deserialize( ~[storm-core-]

Solution worked for me:  Stop zookeeper and supervisor, clean data dirs. This will probably not suite the production because of data lost.

Continue reading