Press "Enter" to skip to content

Author: Kevin Feasel

Deploying Azure Data Services via Terraform

Chris Adkin has started a series on deploying Azure Arc enabled Data Services. Part 1 serves as an introduction

:One of the most significant things to change the landscape for Azure data professionals will be general release of Azure Arc enabled Data Services. To provide an expedient means of experiencing all that Azure Arc has to offer, Microsoft has come up with Jumpstart – a collection of GitHub repos for deploying Arc in different scenarios. Last Christmas I had a few vacation days and took the opportunity to try out Jumpstart for Azure Arc enabled data services on AWS. AWS was my choice because it made a certain amount of sense to try out Azure Managed SQL Server instances and Postgres Hyperscale on a cloud that they are not natively available on. After all, the whole point of Azure Arc enabled Data Services is to bring Azure to you on your terms if for any reason you cannot use the Azure cloud. 

Part 2 gives us an introduction to Terraform:

Before diving into what the various Terraform modules do that make up the Arc-PX-VMware-Faststart repo, I’m going to provide an introduction to Terraform in this blog post. Terraform comes from Hashicorp, it is a tool that works on the principle of infrastructure-as-code. Resources are specified in what are called configuration files using Hashicorp Control Language in a declarative manner, i.e. you state what you want and to the best of its ability Terraform attempts to create those resources for you. ‘Providers’ are used to create resources for particular types of entity, for example you might use local file, helm (the Kubernetes package manager), Azure, VMware providers etc. etc. . . . Using providers requires plugins, most of which are provided by Hashicorp, but third parties can write their own plugins also.

Check out the first two posts in what promises to be an interesting series.

2 Comments

Adding Images to Excel using Powershell

Mikey Bronowski continues a series on working with Powershell:

This is part of the How to Excel with PowerShell series. Links to all the tips can be found in this post.
If you would like to learn more about the module with an interactive notebook, check this post out.

Spreadsheets’ main purpose is data: storing, manipulating and analyzing them. We can add some colours or charts to make the data more friendly, but sometimes we may want to add something else – like a logo or picture and all that can be achieved with PowerShell.

Read on to see how you can lay out an image or add shapes to a spreadsheet.

Comments closed

Performance Gains with APPLY

Erik Darling gives us a scenario where OUTER APPLY is quite useful:

In all, this query will run for about 18 seconds. The majority of it is spent in a bad neighborhood.

Why does this suck? Boy oh boy. Where do we start?

– Sorting the Votes table to support a Merge Join?
– Choosing Parallel Merge Joins ever?
– Choosing a Many To Many Merge Join ever?
– All of the above?

I voted “all of the above.” Click through to see how Erik turns a bad query plan into a much less bad query plan.

Comments closed

Bad Request when Debugging an Azure Data Factory Pipeline

Ed Elliott ran into a problem:

Now, whenever I am troublehooting something in Azure and I come to the activity logs I am always hopeful but also always dissapointed that they don’t show more details. The bit that really annoys me is that I know Micrsoft see more detailed error information as I have been screen sharing with a support tech who used log exporer to see more detailed error messages than I see – grrrr, just show us the data! Anyway, I digress – so in the activity log, does it give a clue as to what is wrong?

No, in a word no it doesn’t. 

Read on for the conclusion, which rates as “Should have been an easy fix but the error message was completely unhelpful.”

Comments closed

The Importance of LSNs to SQL Server

Jack Vamvas explains a concept:

I was talking to an Auditor recently – who specialises in large Corporate Audits – and they asked me how would I prove a certain database which is backed up is actually restored to another server.  One of the methods I described was using the Log Sequence Numbers (LSN).     

Read on for an explanation of how they work and how you can use LSNs to solve that auditing issue.

Comments closed

Keeping msdb Clean

Eitan Blumin takes the data janitor role seriously:

As part of its regular, ongoing, day-to-day activities, your SQL Server instance would naturally collect historical data about its automated operations. If left unchecked, this historical data could pile up, leading to wasted storage space, performance hits, and even worse issues.

MSDB would obviously be collecting data about the SQL Agent job executions. But there are also a few other types of historical data that needs to be cleaned up once in a while. In this blog post, I hope to cover all bases and leave no historical data un-cleaned.

Read on for several data sources which you’ll want to keep tidy.

Comments closed

More Number Series Generator Solutions

Itzik Ben-Gan continues a series on generating rows quickly:

This is the third part in a series about solutions to the number series generator challenge. In Part 1 I covered solutions that generate the rows on the fly. In Part 2 I covered solutions that query a physical base table that you prepopulate with rows. This month I’m going to focus on a fascinating technique that can be used to handle our challenge, but that also has interesting applications well beyond it. I’m not aware of an official name for the technique, but it is somewhat similar in concept to horizontal partition elimination, so I’ll refer to it informally as the horizontal unit elimination technique. The technique can have interesting positive performance benefits, but there are also caveats that you need to be aware of, where under certain conditions it can incur a performance penalty.

This is a great post on a rather complex topic.

Comments closed

Data Paging using Common Table Expressions

Steve Stedman takes us through one method of generating pages of data:

I can remember the first time that I worked on data paging code. I had to page through web site search results 20 at a time on a given web page. My task was to understand how it had been written and to do some bug fixing. After reviewing about 2,000 lines of code, and reviewing the seven different variables that were being used to know the current page, the next page, the previous page, the page size, the row at the top of the page, the row at the bottom of the page, and the number of pages, I finally understood what the code was intended to do. What a mess that was, but beyond the mess, the performance was horrible. The way the page worked was that based on the page you were on, all of the rows would be queried, then a loop would read through all of the rows before the current page, then loop through the rows on the current page displaying them on the page, and finally it would ignore the results after the current page. So page 1 was slow, page 2 was slower than page 1, page 3 was slower than page 2 and on and on.

Since that point I have implemented several different data paging algorithms myself, all better than the original implementation but none as elegant as the CTE way of doing data paging. I used to look at data paging as a painful task, but thanks to the SQL Server implementation of CTEs there is no more pain.

Steve also covers OFFSET and FETCH. This technique won’t be great with enormous data sets, but for moderate-sized data sets which query quickly, it works. This is one area which is quite painful, and the best (and wackiest) solution I’ve come up with in SQL Server when the initial query is quite expensive is to create tables with random names to store results and row numbers, populate a table the first time a query is run, and query that table on subsequent runs, using the RETURN value in a stored procedure to pass along the name of the table to access. Granted, that solution works best with static data and you’d want to have a method to clean up those tables after they’re no longer in use (like storing a list of those tables and their last access dates and times). So it’s a mess.

Comments closed

Disliking User-Defined Data Types

Andy Levy has a bone to pick:

Here’s the thing – these types are really just aliases for native types in SQL Server, but more constrained. Constrain yourself to UDTs and you’ll have trouble right-sizing your fields. Let’s say you’ve got three data types for text data:

– myShortString (varchar(10))
– myString (varchar(256))
– myBigString (varchar(8000))

These lengths are not helping anyone. You can’t store email addresses or names in myShortString. But myString is probably way too much for that data. You’re going to waste memory because of how SQL Server estimates memory grants and your indexes will be bloated. Maybe you just need to create more UDTs to cover these situations. But that just compounds the other problems, doesn’t it?

Pushes glasses up the bridge of his nose. Teeeeechnically, an e-mail address may be up to 256 characters long, including a username of up to 64 characters (and maybe two angle brackets, depending on the host). So myString would actually be perfect. Steve Jones has a comment about 300, but that was probably the original standard of 320. Regardless, I realize how far beside the point that is, and Andy’s point is a good one, as well as the several others he makes in the post.

One quick note on defined types: they really do make a lot of sense in a domain-driven design, especially when working with functional programming languages. Defining a CustomerID as an int is fine, but if I know my customer IDs are natural numbers (1, 2, 3, …), 9 digits long, and do not contain the sequence 2345 (because my company considers this an unlucky number sequence), creating a CustomerID type which provides this sort of type checking is great because you keep the rules as close to the data as possible and ensure consistency. It’s also more restrictive than int, so you can cast back down to an int when you’re ready to interact with some remote system. So short answer, do this all day in F#, but not in the database.

Comments closed

Fun with Disabling Joins

David Alcock gives us a “wouldn’t it be neat to see this?” scenario:

If you’ve ever seen my session on Guillotines and the Query Optimiser I demonstrated how we can use query hints to switch off certain transformation rules in a query. Transformation rules use a pattern substitution which essentially swaps one expression for another. In SQL Server terms the best example of this is a join as although we use the logical expressions such as left join or inner join the physical join type used in the execution plan operator is substituted to be something like a Hash Match or Nested Loop Join.

This means we can force the behaviour of an execution plan by disabling certain rules. Now I can’t think of any practical reason for this and remember we can also use join hints in our queries but this is different as it affects the entire optimisation process for a query and not just one join so please follow in a sandbox environment.

In case you haven’t seen David’s session, it’s available on YouTube.

Comments closed