Press "Enter" to skip to content

Curated SQL Posts

ETL via Powershell

Greg Moore builds a simple ETL process using Powershell:

Recently a customer asked me to work on a pretty typical project to build a process to import several CSV files into new tables in SQL Server. Setting up a PowerShell script to import the tables is a fairly simple process. However, it can be tedious, especially if the files have different formats. In this article, I will show you how building an ETL with PowerShell can save some time.

It’s a simple process, but that’s a good reminder that simple processes can be good processes.

Leave a Comment


Sean Gallardy lays out what HADR_SYNC_COMMIT really tells you:

Initially I thought to myself, “this is the most misunderstood wait type that exists in the HA space for SQL Server”, then I realized maybe this isn’t the case… So, I pondered over this question, “is it truly misunderstood?” and came to the (possibly incorrect) realization that it is quite accurate in the general SQL Server’s users’ space of understanding. I also concluded that, really, it’s the way the wait is used in SQL Server coupled with how waits work in SQL Server, which leads to how it is viewed. Let me explain….

You’ll definitely want to read Sean’s explanation.

Leave a Comment

Automating Notebook Execution with Powershell

Julie Koesmarno shows off an automation process for notebooks:

When I first think about automation, I generally think in the following way: in order to automate a script, we want to ensure that the script itself can be run via a command line interface (CLI) and with almost no user interaction (except for input and output parameters). Now, how do we apply this to Jupyter Notebooks so that we can automate SQL notebooks or PowerShell Notebooks?

The good news is that these SQL notebooks and PowerShell notebooks that we’ve created using Azure Data Studio, can be run on PowerShell CLI. If these notebooks can be run on PowerShell CLI, that means any automation systems or serverless architecture (Azure Automation combined with Azure Logic Apps as an example) should be able to run these notebooks also.

In this blog post, I’ll cover examples on using Invoke-SqlNotebook, using Invoke-ExecuteNotebook and putting it together with Azure Automation.

Click through to see the whole thing.

Leave a Comment

Thinking Twice about Single-Column Indexes

Erik Darling wants you to perform a sanity check:

There are times when a single key column index can be useful, like for a unique constraint.

But for the most part, outside of the occasional super-critical query that needs to be tuned, single key column indexes either get used in super-confusing ways, or don’t get used at all and just sit around hurting your buffer pool and transaction log, and increasing the likelihood of lock escalation.

Read on for Erik’s full point. Sometimes that single-column non-clustered index really does do the trick—as in a unique key constraint, or a single column used in a really commonly-used EXISTS clause—but it’s worth thinking about whether that one column is really all there is.

Leave a Comment

Up or Out: Promotion Paths

Bob Pusateri describes a common promotion path:

But one piece of career growth that I’ve never gotten from any employer is a promotion.

Allow me to define “promotion”. In my mind, a promotion is where an employee earns a change in job title with commensurate increase in responsibility. I won’t even say it has to include an increase in pay, though I imagine it often would. Similarly, a promotion need not involve moving into a management position. Simply put, to me a promotion is doing a great job as a junior widget maker and then one day being told your hard work and contributions have been noticed and you are no longer junior.

This is pretty common. As an example of the other side of the coin, I’ve been promoted (using Bob’s definition) at two separate companies. But in a lot of cases, hierarchies in engineering teams have nearly disappeared, such that there’s no longer a Software Developer 1 who can be promoted into a Software Developer 2, and then a 3 and a 4, and then an Enterprise Architect 1, etc. Instead, out is the new up.

The funny part of the pattern is that I know people who have left a company in order to move up. Then, after some time with the new company, they return to the old company in a higher role.

Leave a Comment

Pure Storage FlashArray Snapshot Torture Test

Argenis Fernandez puts SQL Server snapshots on a Pure Storage FlashArray to the test:

Look, I’m not here to fight your religious war about how snapshots should not be called backups. I’m just gonna call them fast-as-fast restores(*) and be done with it. Because let’s be honest, with Pure Storage there’s absolutely nothing faster than a storage snapshot to recover a volume. Or volume(s). You get the idea. It’s about how fast you recover, every time.

Yes, I do understand that there are a million of considerations for something to be called a “backup”. We’ll get to those little by little – don’t expect a thorough post on that debate right now. Today I want to focus on one question: Are Pure Storage FlashArray snapshots stable, trustworthy enough that I can take them without pausing I/O against my database? Can I trust that the database will come online every time from a snapshot?

Read on for the Answer. For additional fun, read the whole article with your mental voice sounding like Argenis.

Leave a Comment

CI/CD with Databricks Notebooks and Azure DevOps

Michael Shtelma and Piotr Majer get us started on an MLOps journey:

This is the first part of a two-part series of blog posts that show how to configure and build end-to-end MLOps solutions on Databricks with notebooks and Repos API. This post presents a CI/CD framework on Databricks, which is based on Notebooks. The pipeline integrates with the Microsoft Azure DevOps ecosystem for the Continuous Integration (CI) part and Repos API for the Continuous Delivery (CD).In the second post, we’ll show how to leverage the Repos API functionality to implement a full CI/CD lifecycle on Databricks and extend it to the fully-blown MLOps solution.

Click through for the article and a link to code. You can also see the pipeline YAML (and Python code it calls) in the repo.

Leave a Comment

Ensemble Classification in Azure Machine Learning

Dinesh Asanka reminds me not to use the designer for tough Azure ML problems:

Let us see how we can extend the standard classification to Ensemble Classifiers in Azure Machine Learning. Before we discuss the details of this configuration, you can view or download the experiment from Ensemble Classification

The following figure shows the complex layout of the Ensemble Classifiers in Azure Machine Learning.

Dinesh is not kidding about that complexity. This is definitely a use case for the Azure ML SDK.

Leave a Comment

Spark Performance Improvements in Azure Synapse

Balaji Sankaran shows improvements Microsoft has made over open-source Apache Spark 3 in Azure Synapse Analytics:

Azure Synapse Analytics is continually focused on delivering a highly performant and scalable platform for supporting Spark Workload. We are focused on improving the query performance for the typical workload patterns that we see with our customers. By combining the latest open-source updates in Apache Spark with our team’s focus on performance updates we have made significant performance gains in standard TPC-DS benchmarking tests.

I expect it will never be as fast as what Databricks can do, but getting a 2x performance improvement over the open source version of Spark is nothing to sneeze at.

Leave a Comment