Category Archives: Big Data

Clear Apache Storm cluster remotely

My bash scripts to clear Storm and Zookeeper cluster remotely by ssh.

Main idea:

Connect to every zookeeper server by ssh and stop zookeeper, then delete data folder. Then connect to every Storm node by ssh, kill Storm processes and delete data folder. Connect to zookeeper servers again, start them. Connect to Storm nodes again, start them. Continue reading

MongoDB query examples

To not keep some Mongo queries in my own memory, I note them here.

With Javascript evaluation in condition

Javascript evaluation is slow. Don’t forget “==” and “this” in condition clause:

Continue reading

Install MongoDB on Ubuntu

I extracted some pieces from a good tutorial from official site: http://docs.mongodb.org/manual/tutorial/ adopted for my tasks.

Install on Ubuntu 14.04 Trusty

Import MongoDB public key:

Continue reading

Install Hive on Ubuntu

Configuration

My configuration is Apache Hive 0.13.0 on machine with Ubuntu 14.04 and  Apache Hadoop 2.2.0

(About Hadoop installation – http://dmitrypukhov.pro/install-hadoop-on-ubuntu)

Download and unpack Hive

Download latest Hive release from Apache web site http://www.apache.org/dyn/closer.cgi/hive/

Unpack to /opt/hive folder. Change owner to Hadoop user and group, hduser and hadoop in my case

$ sudo chown -R hduser:hadoop /opt/hive

Continue reading

Install Hadoop on Ubuntu

The best article of Hadoop installation I found is http://codesfusion.blogspot.ru/2013/10/setup-hadoop-2x-220-on-ubuntu.html 

I adopted it a little for my configuration: Ubuntu 14.04, Hadoop 2.2.0 single node, JDK 6. Not latest versions of Java and Hadoop but I had to reproduce existing production system.  

My steps are:

Install prerequisites

JDK 6 and ssh:

$ sudo apt-get install oracle-java6-installer
$ sudo apt-get install python-software-properties
$ sudo add-apt-repository ppa:webupd8team/java
$ sudo  apt-get update
$ sudo apt-get install oracle-java6-installer
$ sudo apt-get install ssh

Continue reading