SparkSession and its Component Contexts

The folks at Hadoop in Real World explain the difference between SparkSession, SparkContext, SQLContext, and HiveContext:

SQLContext is your gateway to SparkSQL. Here is how you create a SQLContext using the SparkContext.
// sc is an existing SparkContext.
val sqlContext = new org.apache.spark.sql.SQLContext(sc)

Once you have the SQLContext you can start working with DataFrame, DataSet etc.

Knowing the right entry point is important.

Related Posts

Flink and Stateful Streaming

Himanshu Gupta explains some of the benefits Apache Flink offers for stateful streaming applicatons: The 2 main types of stream processing done are:1. Stateless: Where every event is handled completely independent from the preceding events.2. Stateful: Where a “state” is shared between events and therefore past events can influence the way current events are processed. […]

Read More

Performance Testing Aiven Kafka

Heikki Nousiainen tests the Aiven platform’s Kafka implementation on different cloud providers at different service levels: We used a single topic for our write operations with a partition count set to either 3 or 6, depending on the number of brokers in each test cluster. As the test clusters were regular Aiven services, the partitions […]

Read More

Categories

February 2019
MTWTFSS
« Jan Mar »
 123
45678910
11121314151617
18192021222324
25262728