Tag Archives: Apache Spark

Work with HBase from Spark shell

My software versions

Spark 1.6.1, HBase 1.2.1, run on EMR 4.7.1

Spark and HBase installation:
http://dmitrypukhov.pro/install-apache-spark-on-ubuntu/,
http://dmitrypukhov.pro/install-hbase-on-linux-dev/

Configure Spark

Edit spark-defaults.conf and ensure spark.driver.extraClassPath and spark.executor.extraClassPath contain path to hbase libraries.  For me it is /usr/lib/hbase/lib/*

My extra class pathes: Continue reading

Apache Spark error on start: java.lang.NoClassDefFoundError: org/apache/spark/SparkConf

Starting a simple Spark project in IntelliJ Idea and getting an exception:

Exception in thread “main” java.lang.NoClassDefFoundError: org/apache/spark/SparkConf
at …
Caused by: java.lang.ClassNotFoundException: org.apache.spark.SparkConf
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
… 2 more

Solution:

Change Spark dependencies scope from provided to compile in pom.xml

 

 

 

 

 

Spark SQL: load parquet in Spark Shell

I have my data in parquet format and want to load and query it using Spark SQL.

Start Spark shell

Load parquet folder to table

Now we can use this table for SQL queries:

 

 

Apache Spark Feature http://apache.org/xml/features/xinclude is not recognized

Problem: Apache Spark 1.3.1 application produces the following error:

Fix: edit pom.xml to use older version of xercesImpl Continue reading