I can’t make Spark work in RStudio

To use Spark2 in Sparklyr, set up the environment variable “SPARK_HOME” like this:

> Sys.setenv(SPARK_HOME='/usr/hdp/current/spark2-client')
> library(sparklyr)
> sc <- spark_connect(master='yarn-client')
> spark_version(sc)
[1] ‘2.1.1.2.6.1.0’

Note! If this doesn’t work, and you experience a lag from the spark-connect command of more than 10 seconds, then it is plausible that your Kerberos ticket has to be renewed (use commandkinit):

  • Disconnect the process in RStudio by clicking the red STOP-icon.
  • Open a SSH shell or choose Tools > Shell in RStudio and write: kinit $USER
  • Now you should be able to connect to Spark in RStudio.