How to start HBase client in AWS EMR and query external HBase DB
- Create EMR cluster with HBase application enabled manually or using command like this:
1 2 3 4 5 6 7 8 9 |
aws emr create-cluster \ --release-label emr-4.6.0 \ --name "my-hbase-cluster" \ --instance-type r3.2xlarge \ --instance-count 2 \ --enable-debugging \ --ec2-attributes KeyName=MyKeyPair \ --use-default-roles \ --applications Name=Hadoop Name=HBase |
2. Establish ssh connection to the cluster
1 |
ssh -i ~/MyKeyPair.pem hadoop@<cluster-ip-address>.us-west-2.compute.amazonaws.com |
3.To work with external database, set zookeeper quorum in /etc/hbase/conf/hbase-site.xml
1 2 3 4 5 6 7 |
<configuration> <property> <name>hbase.zookeeper.quorum</name> <value>my-hbase-zookeeper-address</value> </property> .... </configuration> |
3. Start HBase shell
1 |
hbase shell |
4. In shell do queries like that:
1 2 3 4 5 6 7 8 9 10 11 |
create 'person', {NAME=>'name'}, {NAME=>'addr'} put 'person', '1', 'name:firstName', 'John' put 'person', '1', 'name:lastName', 'Smith' put 'person', '1', 'addr:planet', 'Earth' put 'person', '1', 'addr:continent', 'Australia' list describe 'person' scan 'person' get 'person', '1', {COLUMNS => ['name']} get 'person', '1', {COLUMNS => ['addr:planet']} |