Curated SQL – Page 977 – A Fine Slice Of SQL Server

Upgrading a SQL Server Big Data Cluster

Published 2020-01-31 by Kevin Feasel

Mohammad Darab shows how to upgrade an existing Big Data Cluster:

The above scenario was updating a Big Data Cluster from a supported release. Microsoft officially supports BDCs starting from SQL Server 2019 GDR1. But what if you have a previous version of BDCs, say CTP or release candidate? In that case you’ll have to backup any data you have, delete your cluster, uninstall azdata, install the updated azdata, and deploy your big data cluster anew. A little cumbersome but that’s how it is. In fact, no one should be running an unsupported release of Big Data Clusters anyway!

Click through for the instructions.

Comments closed

What’s New with Delta Lake

Published 2020-01-30 by Kevin Feasel

Denny Lee and Tathagata Das announce Delta Lake 0.5.0:

With the following pull requests, you can now run even more Delta Lake operations concurrently. With finer grain conflict detection, these updates make it easier to run complex workflows on Delta tables such as:
– Running deletes (e.g. for GDPR compliance) concurrently on older partitions while newer partitions are being appended.
– Running file compactions concurrently with appends.
– Running updates and merges concurrently on disjoint sets of partitions.

Click through for the full changelog.

Comments closed

Cloudera and R

Published 2020-01-30 by Kevin Feasel

Ian Cook shows us how Cloudera works for R users:

Our customers love it when they can use familiar syntax to work with data regardless of its size or its source. The popularity of sparklyr is a case in point: it enables R users to use either SQL or dplyr—both familiar to most R users—to work with large-scale data using Apache Spark. Two R packages developed at Cloudera—implyr and tidyquery—aim to provide this same choice of either SQL or dplyr when querying tables with Apache Impala and when manipulating R data frames.

The implyr package is new to me, but looks interesting.

Comments closed

The Cost of Sorting in Stored Procedures

Published 2020-01-30 by Kevin Feasel

Monica Rathbun wants us to think about whether we really need that ORDER BY clause:

We know that sorting can be one of the most expensive things in an execution plan as shown below. However, we continue to do ORDER BYs repeatedly. Yes, I 100% agree that there is a need to sort a results set and that this should be done in the procedure for good reason, but my concern is having multiple sorts, erroneous sorts, and the sorts that can be done elsewhere. These are the ones that waste resources and can stifle performance.

Click through for a demo showing that this does make a difference.

Comments closed

Chaos Engineering with SQL Server

Published 2020-01-30 by Kevin Feasel

Andrew Pruski is excited about Chaos Engineering:

Chaos Engineering is a practice in which we run experiments against a system to see if it reacts the way we expect it to when it encounters a failure.
We’re not trying to break things here…Chaos Engineering is not breaking things in production.
If I said to my boss that we’re now going to be experiencing an increased amount of outages because, “I AM AN ENGINEER OF CHAOS”, I’d be marched out the front door pretty quickly.

On the plus side, we will know Andrew’s supervillain origin story.

2 Comments

What Uses tempdb?

Published 2020-01-30 by Kevin Feasel

Jason Hall takes us through several SQL Server features which use tempdb behind the scenes:

Since SQL Server 2005, triggers use the version store, which is kept in tempdb. So, if you’re using triggers, they are implicitly using tempdb. Remember that triggers are set-based, and you’ll get version data for every affected row. For really big batches, this can turn out to be quite a bit of version data.

Click through for several more examples.

Comments closed

An Intro to Power BI Premium

Published 2020-01-30 by Kevin Feasel

Gilbert Quevauvilliers gives us an overview of what Power BI Premium is and what you need to know before using it:

A great place to start is to first explain “What is Power BI Premium?”
At the very basic level Power BI Premium allows you to buy dedicated capacity with additional features.

Read on to see what that means, what you get with Premium, and how it can benefit your organization. Gilbert also covers the pricing model, which is important because this isn’t cheap.

Comments closed

A Script for Peer Reviews of Code

Published 2020-01-30 by Kevin Feasel

Paul Andrew shares with us a script which is useful for peer reviewing code before check-in:

Does the code include good comments? Things that explain the reason why logic has been implemented to assist future developers looking at the same code.
All to often I see code comments written that just translate from code to English and tell me what is happening. What should be fairly obvious to the reviewer as they read the code. Why is so much more important.

It’s a 20-point checklist, but worth reviewing and adapting for your own purposes.

Comments closed

Ghost Records and Availability Groups

Published 2020-01-30 by Kevin Feasel

Aaron Bertrand ran into an interesting problem:

I recently came across an issue in a SQL Server Availability Group scenario where queries against a heavily-used queue table were taking longer and longer over time. The symptoms of the query were that logical reads were increasing rapidly, and we ultimately tracked it down to ghost records (and version ghost records) that were being created as rows from the queue table were consumed and deleted. Because the database was being used in a readable secondary, the ghost cleanup process simply wasn’t able to keep up with the volume of deletes against the table. The first workaround implemented was to suspend the queue consumers and rebuild the table.

Read on to see how Aaron used filtered indexes and soft deletes to mitigate some of the pain.

Comments closed

An Introduction to gRPC

Published 2020-01-29 by Kevin Feasel

Munander Singh explains why you might want to use gRPC for cross-process communications:

When the client invokes the service, the client-side gRPC library uses the protocol buffer and marshals the remote procedure call, which is then sent over HTTP2. Server un-marshal the request and executes the respective procedure invocation using protocol buffers. The response follows a similar execution flow from the server to the client.
The main advantage of developing services and clients with gRPC is that your service code or client side code doesn’t need to worry about parsing JSON or similar text-based message formats (within the code or implicitly inside the underlying libraries such as Jackson, which is hidden from service code). What comes in the wire is a binary format, which is unmarshalled into an object. Also, having first-class support for defining a service interface via an IDL is a powerful feature when we have to deal with multiple microservices and ensure and maintain interoperability.

The “yes, but” here is that based on your language of choice, gRPC can be a bit tricky to get going.

Comments closed

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31

Curated SQL Posts