Tag Archives: Big Data

Accessing AWS s3 from on premises Hadoop

Add aws libraries to class path

hadoop-aws-*.jar library is not in classpath by default, but it exists in $HADOOP_HOME/tools/lib folder. To fix it, edit $HADOOP_HOME/etc/hadoop/hadoop-env.sh and add the following line:

Continue reading

Install Zookeeper on Linux

Zookeeper installation steps are nice and easy for dev environment.

  1. Download zookeeper from http://zookeeper.apache.org, extract it to some place, let it be /opt/zookeeper/
  2. Create a simple zoo.cfg file, i.e. copy config sample /opt/zookeeper/conf/zoo_sample.cfg to /opt/zookeeper/conf/zoo.cfg
  3. Start zookeeper
    /opt/zookeeper/bin/zkServer.sh start
  4. Stop zookeeper
    /opt/zookeeper/bin/zkServer.sh stop

HBase shell on AWS EMR cluster quickstart

How to start HBase client in AWS EMR and query external HBase DB

  1. Create EMR cluster with HBase application enabled manually or using command like this:

2. Establish ssh connection to the cluster

3.To work with external database, set zookeeper quorum in /etc/hbase/conf/hbase-site.xml

3. Start HBase shell

4. In shell do queries like that: