How to use DataFrame in Scala?
Scala Dataframe FAQ: How do I use a DataFrame in Scala?
Dataframe is using Spark SQL to implement.
Json file:
{"id" : "1201", "name" : "satish", "age" : "25"},
{"id" : "1202", "name" : "krishna", "age" : "28"},
{"id" : "1203", "name" : "amith", "age" : "39"},
{"id" : "1204", "name" : "javed", "age" : "23"},
{"id" : "1205", "name" : "prudvi", "age" : "23"}
sql is the instance of class SQLContext, SQLContext instance was not created by Scala by default, but spark session class and SparkContext class are there by default. So firstly need to create a SQLContext instance:
scala> val sql = new org.apache.spark.sql.SQLContext(sc)
Then read the json file with specific location of the json file:
scala> val df = sql.read.json("input_data/employee.json")
Now the DataFrame df was created, and the related functions can be called by df:
scala> df.show
+---+----+-------+
|age| id| name|
+---+----+-------+
| 25|1201| satish|
| 28|1202|krishna|
| 39|1203| amith|
| 23|1204| javed|
| 23|1205| prudvi|
+---+----+-------+
scala> df.select("id").show
+----+
| id|
+----+
|1201|
|1202|
|1203|
|1204|
|1205|
+----+