s3-dist-cp is missing in EMR 4

I got an issue with s3-dist-cp command on Spark AWS EMR 4.5 cluster.

The issue: s3-dist-cp command step fails with error: java.lang.RuntimeException: java.io.IOException: Cannot run program “s3-dist-cp” (in directory “.”): error=2, No such file or directory

The cluster is created by this script:

 

Solution:

Change –applications parameter to install Hadoop application along with Spark to the cluster –applicatons Name=Hadoop Name=Spark. The proper creation script is:

s3-dist-cp will be recognized now.

Leave a Reply

Your email address will not be published.