Press "Enter" to skip to content

Curated SQL Posts

Azure Synapse Analytics November Updates

James Serra keeps us up to date on Synapse:

Delta Lake support for serverless SQL is generally available: Azure Synapse has had preview-level support for serverless SQL pools querying the Delta Lake format. This enables BI and reporting tools to access data in Delta Lake format through standard T-SQL. With this latest update, the support is now Generally Available and can be used in production. See How to query Delta Lake files using serverless SQL pools

Click through for the full list of what James likes.

Comments closed

Lambda Expressions in Scala

Shubham Shrivastava explains how lambda expressions work in Scala:

Lambda expressions in Scala the syntax for these uses a symbol it’s an equals and a greater than and we refer to this as rocket ( => ). When we’re reading the code the idea of a lambda expression is a short literal expression that defines a function and typically these should not be overly long. so for example I could define a function for squaring values that looks something like this.

Lambda expressions are great in cases where you need to perform an operation exactly one time. If you create a separate function with its own name, there’s always a wonder in the back of a developer’s mind if this thing will get used again, and so it takes up a little bit of cognitive load. A lambda expression answers that conclusively: no, we won’t use this code again.

Comments closed

Deploying dbt on Databricks

Dave Eyler, et al, have a great announcement:

At Databricks, nothing makes us happier than making our users more productive, which is why we are delighted to announce a native adapter for dbt. It’s now easier than ever to develop robust data pipelines on Databricks using SQL.

dbt is a popular open source tool that lets a new breed of ‘analytics engineer’ build data pipelines using simple SQL. Everything is organized within directories, as plain text, making version control, deployment, and testability simple.

Click through for more information on how this works and how you can get the native adapter.

Comments closed

Automating Historical Partition Processing in PBI Per User

Gilbert Quevauvilliers runs into a timing issue:

I recently had a big challenge with one of my customers where due to the sheer volume of data and network connectivity speed I was hitting the 5-hour limit for processing of data into my premium per user dataset.

My solution was to change the partitions from monthly to daily. And then once I have all the daily partitions merge them back into monthly partitions.

The challenge I had was I now had to process daily partitions from 2019-01-01 to 2021-11-30. This was a LOT of partitions and I had to find a way to automate the processing of partitions.

Not only that, but I had to ensure that I did not overload the source system too!

Read on to see what Gilbert did to solve this problem.

Comments closed

Determining the Right Batch Size for Deletes

Jess Pomfret breaks out the lab coat and safety goggles:

I found myself needing to clear out a large amount of data from a table this week as part of a clean up job.  In order to avoid the transaction log catching fire from a long running, massive delete, I wrote the following T-SQL to chunk through the rows that needed to be deleted in batches. The question is though, what’s the optimal batch size?

I usually go with a rule of thumb: 1K for wide tables (in terms of columns and row size) or when there are foreign key constraints, 10K for medium-width tables, and about 25K for narrow tables. But if this is an operation you run frequently, it’s worth experimenting a bit.

Comments closed

Building a Pipeline for External Data Sharing

Hope Foley has data to share:

I worked with a customer recently who had a need to share CSVs for an auditing situation.  They had a lot of external customers that they needed to collect CSVs from for the audit process.  There were a lot of discussions happening on how to best do it, whether we’d pull data from their environment or have them push them into theirs.  Folks weren’t sure on that so I tried to come up with something that would work for both. 

Read on for Hope’s solution to the problem.

Comments closed

Tracking SQL Server Uptime

Garry Bargsley has a cmdlet for us:

This week’s blog post will help you check your SQL Servers up-time. There are numerous reasons I can think of that you would want to know how long your SQL Server has been online. Was the server recently patched, did it crash and come back online, or did someone restart it by mistake? These are all valid questions about a single SQL Server or your entire estate. I will show you how you can easily check one too many servers quickly for uptime.

We will start by using every DBA’s favorite PowerShell module…  dbatools

Admittedly, I’d just check the start time for the tempdb database, but this cmdlet does give more info.

Comments closed

Reasons for Partitioning in SQL Server

Erik Darling has opinions:

When I work with clients, nearly every single one has this burning question about partitioning.

“We’ve got this huge table, should we partition it?”

“Do you need to insert or delete data in big chunks?”

“No, it’s all transactional.”

“Do you have last page contention problems?”

“No, but won’t it help performance?”

“No, not unless you’re using clustered column store.”

“…”

Read on to unpack Erik’s argument. I do wish that there were more good cases for partitioning in SQL Server, but they’re almost all in the analytics space—which is part of why partitioning is a lot more useful in Azure Synapse Analytics dedicated SQL pools.

Comments closed

Testing IOPS, Latency, and Throughput: an Analogy

Brent Ozar has a timely analogy for us:

You’re trying to decide whether to use DHL, FedEx, UPS, or your local postal service.

You could measure them by sending me a gift – after all, it is the holidays, and I do a lot of stuff for you year round, and it’s probably the least you could do.

– You place one box outside

– You call the shipping company to come pick it up, and start the clock

– When I get it, I’ll call you to confirm receipt, and you stop the clock

Click through for the rest of the story.

Comments closed

Using Scala at Databricks

Li Haoyi gives us a peek behind the curtain:

With hundreds of developers and millions of lines of code, Databricks is one of the largest Scala shops around. This post will be a broad tour of Scala at Databricks, from its inception to usage, style, tooling and challenges. We will cover topics ranging from cloud infrastructure and bespoke language tooling to the human processes around managing our large Scala codebase. From this post, you’ll learn about everything big and small that goes into making Scala at Databricks work, a useful case study for anyone supporting the use of Scala in a growing organization.

It’s always interesting to see how the largest companies handle certain classes of problems. From this post, we can get an idea of the high-level requirements and usage, making it worth the read.

Comments closed