Combining the two dataframes
First we match the column names by using the unionByName method.
public static void combineDataFrames (Dataset<Row> df1, Dataset<Row> df2) {
Dataset<Row> df = df1.unionByName(df2);
df.show(500);
df.printSchema();
System.out.println("We have " + df.count() + " records.");
df = df.repartition(5);
Partition[] partitions = df.rdd().partitions();
System.out.println("Total number of Partitions: "+ partitions.length);
}
So the resultant dataframe with the combination of two dataframes df1 and df2 is shown below.
and here is the schema for our resultant dataframe.
root
|-- park_id: string (nullable = true)
|-- park_name: string (nullable = true)
|-- city: string (nullable = false)
|-- address: string (nullable = true)
|-- has_playground: string (nullable = true)
|-- zipcode: string (nullable = true)
|-- land_in_acres: string (nullable = true)
|-- geoX: string (nullable = true)
|-- geoY: string (nullable = true)