Press "Enter" to skip to content

Curated SQL Posts

ETL Anti-Patterns: a Festivus Miracle

Tim Mitchell is ready to air some grievances:

We’re rounding the corner to the second half of December, which means it’s time for my favorite holiday: Festivus! Like many of you, I enjoy gathering around the Festivus pole and sharing the time-honored traditions such as the Feats Of Strength and the Airing Of Grievances.

But my favorite Festivus tradition takes place right here on this blog: the Eleven Days of Festivus. Each year, I write a daily blog post each of the eleven days leading up to Festivus, usually around a central theme. 

Tim has three posts up so far. First is around jumping straight into the code-writing phase:

Most data architects and developers are intensely curious folks. When we see a set of data, we want to immediately step into a data whisperer role. Where others may see a jumbled mess, we see an opportunity to discover patterns and answers. The best data architects crave those data discovery finds the same way a baseball player craves a bottom-of-the-9th game-winning home run.

That kind of intellectual curiosity is a necessary trait for data architects, but it can lead to a rush straight into writing ETL code. I’ve seen this a lot, and have done it myself (and admittedly still do it on occasion): skipping past the business-value analysis and diving straight into the haystack looking for needles. Getting raw data into a format that can easily be analyzed and validated is a critical part of the ETL development life cycle, but rarely is it the first step.

Second, processing too much data:

A common design flaw in enterprise ETL processes is that they are processing too much data. Having access to a great breadth and depth of data opens up lots of options for historical reporting and data analytics, but very often it is mistakenly assumed that all of the available data must be processed through ETL.

Although it may sound counterintuitive, there are many cases where purposefully leaving some data out of the ETL process leads to a better outcome. 

Third is performing full loads when incremental loads are possible:

Earlier this year, I wrote about the concepts of incremental loads and discussed the benefits of loading data incrementally. To recap: an incremental load moves only the new and changed data from each source – rather than the entire bulk of the source data – through the ETL pipeline.

Using incremental loads can improve both the speed and accuracy of data movement and transformation. The time required to process data increases with the volume of said data, and extracting only the new and changed data from the source can ensure an accurate ‘point-in-time’ representation of the data. For these reasons, loading data incrementally is, for most data load needs, the better way to go.

This is a good series to track.

Comments closed

Working with SQL Server Certificates in Powershell

Peter Schott walks us through the process of certificate maintenance:

I recently had a need to add certificates to SQL Servers throughout an organization. There were quite a few servers to update and the certificates would need to be generated using a given format. This would include some descriptors for the names, IPv4 address, and ensuring that SQL Server would see the certificate when finished.

I realized this would need some sort of script so reached for PowerShell and the dbatools module. There’s a function in dbatools that supports setting the SQL Server Certificate and I knew that would be useful. But first, I had to generate the certificate itself. I read up on this in PowerShell and there’s no “easy” button for creating a certificate at this time, especially not when you need to add extra properties.  Posts such as this one helped me get started. It works by creating an INF file, then shelling out to “certreq.exe” to generate the CSR file needed to obtain a certificate from a certificate authority. We had need to use the DNS name, the FQDN, and the IPv4 address as part of our certificate request, so I had to adjust my code to handle that.

Click through to see how.

Comments closed

Using the Synapse Studio Monitor Hub

Saveen Reddy takes us through monitoring processes in Azure Synapse Analytics:

In order to test out SQL Script Monitoring in Azure Synapse we need some SQL Scripts. We can get some good ones from Azure Synapse Knowledge Center. Inside the Synapse workspace, choose the Develop option from the left menu to open the Develop Hub. Select “+” Add New Resource command and Browse gallery to navigate to the gallery.

Read on to see it in action.

Comments closed

Web Scraping in SQL Server Machine Learning Services

Rajendra Gupta shows us how we can use SQL Server Machine Learning Services and the R programming language to perform website scraping:

You can manually copy data from a website; however, if you regularly use it for your analysis, it requires automation. For this automation, usually, we depend on the developers to read the data from the website and insert it into SQL tables.

SQL Machine Learning language helps you in web scrapping with a small piece of code. In the previous articles for SQL Server R scripts, we explored the useful open-source libraries for adding new functionality in R.

Read on for a demo.

Comments closed

Coalesce in SQL and R

John MacKintosh gives us a primer on the COALESCE function in both SQL and R:

What does coalesce mean? In the English language, it is generally used to convey a coming together, or creating one whole body, mass or system. How does that help us when working with data? We spend a lot of time cleaning our data, surely the last thing we want to do is lump it all together?

Click through for detail on the nuances of COALESCE(). H/T R-Bloggers.

Comments closed

Using the Develop Hub in Azure Synapse Analytics

Charles Feddersen shows off one of the Azure Synapse Analytics hubs:

The Develop Hub in Azure Synapse Analytics enables you to write code and define business logic using a combination of notebooks, SQL scripts, and data flows. This gives us a development experience which provides the capability to query, analyze, and model data in multiple languages, along with giving us Intellisense support for these languages. This provides a rich interface for authoring code and in this post, we will see how we can use the Knowledge Center to jump-start our development experience.

Click through to see two demos, one of notebooks and one of SQL scripts.

Comments closed

The Merge Interval Operator

Hugo Kornelis looks at another execution plan operator:

The Merge Interval operator reads dynamic seek range specifications, checks to see if their specified ranges overlap, and if so combines the overlapping ranges into one new range.

One typical use case is for a query that uses multiple BETWEEN specifications, connected with OR. When these ranges overlap, they must be combined into a single range. This saves performance, but more important is that it prevents rows that satisfy both range specifications from being returned multiple times. When the boundaries of the BETWEEN are given as constants, the optimizer analyzes for overlaps and combines ranges if needed when compiling the query. But when the boundaries of the BETWEEN specifications are only known at run-time (variables, column references), the Merge Interval operator is used for this task.

Click through to see how it works.

Comments closed

Measure Your DBA Skills

Lee Markum has just wrapped up an interesting series:

Over the last 9 weeks I took you on a journey of skills and career topics related to being a SQL Server DBA. We looked at the Production DBA. We saw skills and career topics from the beginning to mid-career to Senior DBA. Then we looked at the Development DBA and their skills and career development needs. Finally there was a wrap up post.

To make it easier for everyone to get to these posts, I decided to bring them all together on a single page.

Click through to get a feeling for where you’re at on the DBA and database developer sides of the house.

Comments closed

New Azure Announcements

Eitan Blumin has a roundup of Azure-related announcements:

On the week of December 7th (especially on December 9th), Microsoft has sent us a whole bag of goodies, announcing the general availability of new features that were only in preview until now, and even newer features that have just entered public preview.

There’s quite a lot to cover here, so let’s try to break it down by categories and provide links for more details. 

Click through for the list.

Comments closed