2022-12-22 – Curated SQL

Apache Spark Performance Tuning Tips

Published 2022-12-22 by Kevin Feasel

RDD does serialisation and de-serialisation of data whenever it distributes the data across clusters such as during repartition and shuffle, and we all know that serialisation and de-serialisation are very expensive operations in spark.
On the other hand, DataFrame stores the data as binary using off-heap storage, no need for deserialization and serialization of data when it distributes to clusters. We see a big performance improvement in DataFrame over RDD

Click through for several additional tips.

Comments closed

Creating an Azure Function for Cosmos DB

Published 2022-12-22 by Kevin Feasel

Hasan Savran needs a function:

Azure Cosmos DB’s Change Feed feature triggers an event for Inserts and Updates in a collection. The easiest way to handle these events is, by executing an Azure Function. In this post, I will focus on creating an Azure Function for Azure Cosmos DB by using VsCode.

Read on for step-by-step instructions. The wizard for creating Function apps and then Azure Functions is pretty well-designed.

Comments closed

Batch Endpoints in Azure ML

Published 2022-12-22 by Kevin Feasel

Tomaz Kastrun is winding down an advent of Azure ML. Day 22 covers batch scoring:

Batch endpoints are a great and simple way to run inference over large volumes of data. They simplify the process of hosting your models for batch scoring.

Click through to see how it all works.

Comments closed

Comparing Table Records with T-SQL

Published 2022-12-22 by Kevin Feasel

Chad Callihan compares and contrasts:

We recently looked at looked at comparing schemas using Azure Data Studio. What if we need to compare tables by using a query? For this post we’ll compare using EXCEPT, NOT IN, and NOT EXISTS to find differences between two tables.

Our two tables to compare will be Comic and Comic_Copy. Based on counts, we have 48 more records in Comic than we do in Comic_Copy. Let’s find the differences.

In Chad’s specific query, NOT EXISTS works great. Where I like EXCEPT is when you need to see if any of the non-key columns differ. For example, if you also needed to compare titles for rows with the same ID and ensure those titles matched.

Comments closed

Adding Emoji to Power BI Apps

Published 2022-12-22 by Kevin Feasel

Ed Hansberry ran out of words:

Your report page names, and in turn, the Power BI app can be enhanced with the judicious use of emoji. I was surprised to find out that the characters came through in full color, and that can help your users find the important pages faster. This can be especially useful in a large Power BI app with dozens of reports and potentially hundreds of pages.

Adding emoji is relatively straight-forward in Windows 10 and 11. Below are the steps for Windows 11.

Read on to learn how to do it with Windows 11, followed by the steps for Windows 10.

Comments closed

From SQL Server to Cassandra

Published 2022-12-22 by Kevin Feasel

Lewis DiFelice shares some lessons learned:

The first 6 months were rough. The cluster had been in operation for more than 6 months but was not doing too well. Performance was poor and, worse, it frequently crashed. It was not a fun time. But eventually, the problems got fixed.

There were several issues (including my inexperience) that caused these problems, but the core one was that the original developer had treated it like another relational database.

Read on for a few tips to make learning (and managing) Apache Cassandra a little easier.

Comments closed

Keys and Certificates with TDE

Published 2022-12-22 by Kevin Feasel

Matthew McGiffen has a big keychain to store all of those keys:

When you first look at the encryption hierarchy for TDE in SQL Server it can be a bit daunting. There seem to be a lot of objects involved and it might not be clear why each is required. It can be tempting to skip a full understanding of all the objects and just get on with setting things up – which is relatively straightforward.

I’d encourage you not to do that and I’ll explain why. There are a lot of scenarios that might crop up in the lifecycle of a TDE protected database instance. Recovering a protected database from backup. Migrating database from one server to another. Managing high availability. The list goes on.

Remember: the bigger the keychain, the more powerful the man.

Comments closed

M	T	W	T	F	S	S
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

Day: December 22, 2022

Apache Spark Performance Tuning Tips

Creating an Azure Function for Cosmos DB

Batch Endpoints in Azure ML

Comparing Table Records with T-SQL

Adding Emoji to Power BI Apps

From SQL Server to Cassandra

Keys and Certificates with TDE