Dropping the unnecessary columns from the dataframe which was constructed from durham-parks.json.

df = df
    .drop ("fields")
    .drop ("geometry")
    .drop ("record_timestamp")
    .drop ("recordid")
    .drop ("datasetid")

The dataframe now constitutes the following data.

Dropping some of the columns in philadelphia_recreations.csv


The refined dataframe is shown below.

For clarity we have changes some of the column names. The java code that does the required data mining is shown below.

Dataset<Row> df = spark.read().format("csv").option("multiline", true)
                .option("header", true)

//		df = df.filter(lower(df.col("USE_")).like("%park%"));
        df = df.filter("lower(USE_) like '%park%' ");

        df = df.withColumn("park_id", concat(lit("phil_"), df.col("OBJECTID")))
                .withColumnRenamed("ASSET_NAME", "park_name")
                .withColumn("city", lit("Philadelphia"))
                .withColumnRenamed("ADDRESS", "address")
                .withColumn("has_playground", lit("UNKNOWN"))
                .withColumnRenamed("ZIPCODE", "zipcode")
                .withColumnRenamed("ACREAGE", "land_in_acres")
                .withColumn("geoX", lit("UNKNONW"))
                .withColumn("geoY", lit("UNKNONW"))

        return df;

