Press "Enter" to skip to content

Curated SQL Posts

Building a Docker Container of a SQL Server Database

I have a post showing how to turn a database in SQL Server into a Docker container:

Today, we’re going to go through the process of turning a database you’ve built into a Docker container. Before we get started, here are the expectations:

1. I want a fully running copy of SQL Server with whatever database I’m using, as well as key components installed.
2. I want this not to be on a persistent volume. In other words, when I destroy the container and create a new one from my image, I want to reset back to the original state. I’m using this for technical demos, where I want to be at the same starting point each time.
3. I want this to be as easy as possible for users of my container. I consider the use of a container here as not particularly noteworthy in and of itself, so the more time I make people think trying to set up my demo environment, the more likely it is that people will simply give up.

With that preamble aside, let’s get to work!

As a bonus, you can finally learn my real thoughts on medieval France. Fun story around that: a much longer time ago than I’m willing to admit, I played a Hundred Years War scenario in Civilization 2, and the one thing I remember from that scenario is killing the Dauphin. After that, the script spawned a new claimant to the throne, who immediately attacked my troops and died. And then the script spawned yet another new claimant, who met the same fate within a couple turns. And then a third. If I remember correctly, I ran France out of claimants to the throne by the end of it.

Comments closed

E-mailing Query Results via Logic Apps + Azure Data Factory

Rayis Imayev has to send an e-mail:

It is a very simple concept, you have a database query result, either a direct database table output or a stored procedure returned result that needs to be sent as email notification. This email could serve as a status report of your daily data processing job or alert notification with some metrics that require immediate attention, and you want to be as user-friendly as possible with this message aka HTML format for your dataset is required.

I’ve built this solution approach over a year go and wanted to document and share it after recently using it in one of my projects. Again, I agree, there other more or less effective ways to achieve the very same goal, this blog post is just a recollection of my efforts to send HTML formatted SQL query results using Azure Data Factory and Logic App.

It’s a bit more effort than sp_send_dbmail, though also considerably more flexible.

Comments closed

An Overview of the T-SQL Script DOM

Dan Guzman provides a public service:

Scripts are parsed by invoking the Parse method of T-SQL script DOM library TSqlParser class. The parser understands the complex T-SQL abstract syntax tree and splits T-SQL source into atomic TSqlParserTokens of TSqlTokenTypes that represent keywords, identifiers, punctuation, literals, whitespace, etc. These low-level tokens are grouped into more meaningful TSqlFragment objects that represent language elements of the script DOM, such as batches, statements, clauses, etc. Fragments, rather than the low-level parser tokens, are most often used in practice, although the underlying tokens are available for specialized requirements

The Parse method returns a TSqlFragment object of type TSqlScript containing all fragments within the script. This top-level fragment of the DOM hierarchy provides programmatic access to all language element fragments in the script. Nearly 1,000 different fragment types exist today due to the many granular T-SQL language elements.

Dan provides several examples of how to use the script DOM, making this a must-read if you’re interested in writing code around SQL Server.

Comments closed

Reducing Costs in Azure

Matt Robertshaw has a few tips for saving money in Azure:

4. Enterprise Dev/Test subscriptions

Enterprise Dev/Test subscriptions are provided as part of Microsoft Enterprise Agreements (EA).  They’re designed for teams of Visual Studio subscribers to run development and test workloads in Azure but at discounted rates, specifically on Windows virtual machines and exclusive gallery images.  This is significant because I see many clients using regular Azure subscriptions for development and test, which means they’re potentially paying more than they need to.

One additional point I’d make, though it kind of fits in with points 7 and 9, is to try to make your own services “cloud-first.” In other words, a typical cloud migration takes services built for on-prem data centers, converts the servers into VMs, ships those VMs up into Azure/AWS/Google/whatever, and then you end up paying more than you did on-prem.

Instead, read up a bit on cloud architecture and see how you might be able to change a service to fit that model. Instead of having a server running all the time, is it possible to store messages in a queue and have functions process these messages on a given schedule? Can you use expensive tools like Azure Synapse Analytics to perform nightly data processing and move the results to a much less expensive Azure SQL Database?

Comments closed

Optimizing Power BI Merge Performance with Table.Join

Chris Webb shows us another way to optimize Power BI merge performance:

The SortMerge algorithm, last in the list above, is the focus of this blog post. I mentioned in my earlier posts that the reason that merge operations on non-foldable data sources are often slow is that both of the tables used in the merge need to be held in memory. There is an exception though: if you know that the data in the columns used to join the two tables is sorted in ascending order, you can use the Table.Join function and the SortMerge algorithm and the data from both sources can be streamed rather than held in memory, which in turn results in the merge being much faster.

That’s the same in the relational world: merge joins are the fastest, assuming that your data is pre-sorted in the proper manner.

Comments closed

Building a Stream Processing App with ksql

The Hadoop in Real World team walks us through event streaming with ksql:

ksqlDB is an event streaming database that enables creating powerful stream processing applications on top of Apache Kafka by using the familiar SQL syntax, which is referred to as KSQL. This is a powerful concept that abstracts away much of the complexity of stream processing from the user. Business users or analysts with SQL background can query the complex data structures passing through kafka and get real-time insights. In this article, we are going to see how to set up ksqlDB and also look at important concepts in ksql and its usage.

Event streaming has become a lot easier over the past couple of years, as Kafka, Spark, and Flink have all matured.

Comments closed

Bulk Loading SQL Server from .NET

Adrian Hills walks us through the SqlBulkCopy class:

Ever been in a situation where rumblings of “process X is too slow” suddenly build into a super-high priority ball of urgency when that next step up in data volume hits? Yeah, that can be fun. No, really, it can be fun because we have strategies to sort this stuff out, right?

In this blog post, I’m going to talk about one particular piece of functionality—SqlBulkCopy—that can help you with bulk data loading. If I had to single out my favorite .NET class, SqlBulkCopy would be at the top of the list. My goal is to introduce you to this class so that maybe it can become a part of your tool belt, too.

Click through to see how it works. If you’re familiar with SSIS, you’re already familiar with the concept if not the specifics.

Comments closed

Unit Testing Azure Data Factory Pipelines

Richard Swinbank walks us through what it takes to run a unit test against an Azure Data Factory pipeline:

In part three of this series I looked at functional tests for ADF pipelines: verifying, in isolation, that pipelines are “doing things right”. In this post I’ll be testing isolated pipelines to check that they’re “doing the right things” – this is one description of a unit test. In a general-purpose programming language, unit tests might be used to verify that an individual line of code is executed, or that it has a particular effect. In Azure Data Factory, the smallest unit of development – a “line of code” – is a pipeline activity. I will be writing tests to verify that specific activities are executed (or not) and to inspect their results.

There’s a fair bit involved in this sort of test.

Comments closed

Power Query Performance Differences in When You Remove Columns

Chris Webb continues a series on optimizing Power Query merge performance:

In my last post I demonstrated how the size of a table affects the performance of Power Query merge operations on non-foldable data sources in Power BI. Specifically, I showed that removing columns from the tables involved in a merge before the merge took place improved performance. But does it matter when you remove the columns? Is it enough to only select the columns you need when you expand the nested table returned by a merge, for example, or just to remove columns after the merge step? So, today’s question is:

Does it make a difference to Power Query merge performance if you remove unwanted columns from your source tables in the step before the merge or in the step afterwards?

Read on for the result, as well as a pleasant surprise around Power BI’s capabilities.

Comments closed