Create Maven project in Eclipse with the following dependency:

<dependency>
  <groupId>org.apache.spark</groupId>
  <artifactId>spark-core_2.11</artifactId>
  <version>2.1.0</version>
</dependency>

Download the Hadoop winutils.exe (https://github.com/srccodes/hadoop-common-2.2.0-bin/tree/master/bin), put inside a directory, and set environment variable:

//directory for Hadoop
D:/progs/apache/hadoop/bin/winutils.exe

//environment variable
HADOOP_HOME=D:/progs/apache/hadoop

Java code of a Spark application:

package some.package;

//imports
Put required imports here!

//main class
public class App {
  public void main(String[] args) {

    //create spark session
    SparkSession session = SparkSession.builder().
    appName("some-app-name").
    master("local[*]"). //use all hardware threads
    getOrCreate();

    //create spark context
    JavaSparkContext context = 
    new JavaSparkContext(session.sparkContext());

    //test using RDD
    List<String> list = new ArrayList<String>();
    list.add("www");
    list.add("abc");
    list.add("xyz");

    JavaRDD<String> rdd = context.parallelize(list);
    JavaRDD<Integer> lengthRdd = rdd.map(str -> str.length);
    ...
  }
}
Advertisements