Category Archives: Big Data

HBase import-export utility

Export table to file system

For single machine deployment target directory can be on local file system (use file: instead of hdfs:)

 

Import into table from file system

 

 

File could only be replicated to 0 nodes instead of minReplication (=1)

Working on newly configured cluster. Can browse HDFS but any write attempt produces an error:

Also I can  create and view tables in Hive, but any insert attempt fails with the same error. Continue reading

Hive2 metastore configuring

By default, hive runs with embedded derby metastore, which allows only one connection. This article is about how to hive with derby network server. Assume hive is installed to /opt/hive folder

  1. Download derby https://db.apache.org/derby/derby_downloads.html to /opt/derby folder
  2. Start derby server nohup /opt/derby/bin/startNetworkServer &
  3. Edit /opt/hive/conf/hive-site.xml
  4. Start derby
  5. Init metastore
  6. Start hiveserver2 and beeline, should work both simultaneously

    and check in browser http://localhost:10002

     

Hive2 installation

Recently installed Hive 2.1.1 on Ubuntu 16.04. Maybe not optimal steps, but it worked for me.

Installation

1. Download Hive from https://hive.apache.org/downloads.html, unpack archive to folder /opt/hive

2. Copy /opt/hive/conf/hive-default.xml.template to hive-site.xml

Edit hive-site.xml, replace all occurences of ${system:java.io.tmpdir}/${system:user.name} with /tmp/hive Continue reading

Accessing AWS s3 from on premises Hadoop

Add aws libraries to class path

hadoop-aws-*.jar library is not in classpath by default, but it exists in $HADOOP_HOME/tools/lib folder. To fix it, edit $HADOOP_HOME/etc/hadoop/hadoop-env.sh and add the following line:

Continue reading

Work with HBase from Spark shell

My software versions

Spark 1.6.1, HBase 1.2.1, run on EMR 4.7.1

Spark and HBase installation:
http://dmitrypukhov.pro/install-apache-spark-on-ubuntu/,
http://dmitrypukhov.pro/install-hbase-on-linux-dev/

Configure Spark

Edit spark-defaults.conf and ensure spark.driver.extraClassPath and spark.executor.extraClassPath contain path to hbase libraries.  For me it is /usr/lib/hbase/lib/*

My extra class pathes: Continue reading