Using The Spark Connector To Speed Up Data Loads

Denzil Riberio explains how you can use the Spark connector for Azure SQL DB and SQL Server to speed up inserting data from Spark into SQL Server 15x over the native JDBC client:

Since the load was taking longer than expected, we examined the sys.dm_exec_requests DMV while load was running, and saw that there was a fair amount of latch contention on various pages, which wouldn’t not be expected if data was being loaded via a bulk API.

Examining the statements being executed, we saw that the JDBC driver uses sp_prepare followed by sp_execute for each inserted row; therefore, the operation is not a bulk insert. One can further example the Spark JDBC connector source code, it builds a batch consisting of singleton insert statements, and then executes the batch via the prep/exec model.

It’s the power of bulk insertion.

Related Posts

Last-Click Attribution With Databricks Delta

Caryl Yuhas and Denny Lee give us an example of building a last-click digital marketing attribution model with Databricks Delta: The first thing we will need to do is to establish the impression and conversion data streams.   The impression data stream provides us a real-time view of the attributes associated with those customers who were served the […]

Read More

Working With Kafka At Scale

Tony Mancill has some tips for working with large-scale Kafka clusters: Unless you have architectural needs that require you to do otherwise, use random partitioning when writing to topics. When you’re operating at scale, uneven data rates among partitions can be difficult to manage. There are three main reasons for this: First, consumers of the “hot” […]

Read More

Categories

May 2018
MTWTFSS
« Apr Jun »
 123456
78910111213
14151617181920
21222324252627
28293031