Curated SQL – Page 456 – A Fine Slice Of SQL Server

Model Diagnostics in Python

Published 2023-07-18 by Kevin Feasel

Christian Lorentzen has released a new package:

Version 1.0.0 of the new Python package for model-diagnostics was just released on PyPI. If you use (machine learning or statistical or other) models to predict a mean, median, quantile or expectile, this library offers tools to assess the calibration of your models and to compare and decompose predictive model performance scores.

This looks like a really useful package, so check it out.

Comments closed

Automating Database Copy in Azure SQL Managed Instance

Published 2023-07-18 by Kevin Feasel

Sasa Popovic creates some clones:

Database copy and database move operations for Azure SQL Managed Instance are very convenient in various situations when you want to copy or move database from one managed instance to another in an online way. What does online mean in this context? It means that the database on destination managed instance will be identical to the source database at the moment when operation is explicitly completed by user action. Copying a database is a size of data operation, and you can expect copy will take some time, but what is important and convenient, unlike point in-time restore where database is in state from some point in time in the past, with database copy you get database in state as it was when the operation was completed.

Read on to see how you can set this up for an Azure SQL Managed Instance.

Comments closed

Window Functions and Serialization in KQL

Published 2023-07-18 by Kevin Feasel

Robert Cain tries out some window functions:

The Kusto Query Language includes a set of functions collectively known as Window Functions. These special functions allow you to take a row and put it in context of the entire dataset. For example, creating row numbers, getting a value from the previous row, or maybe the next row.

In order for Window Functions to work, the dataset must be serialized. In this post we’ll cover what serialization is and how to create serialized datasets. This is a foundational post, as we’ll be referring back to it in future posts that will cover some of the KQL Windowing Functions.

Read on to see how to serialize data, what the risks of serialization are, and then how to generate a row number in KQL.

Comments closed

Taking Over a Power BI Dataset with a Service Principal

Published 2023-07-18 by Kevin Feasel

Angela Henry takes it out of the user’s hands:

A little background for those new to using Power BI and Data Gateways. If the data source for your Power BI dataset lives on-prem or behind a private endpoint, you will need a Data Gateway to access the data. If you want to keep your data fresh (either using Direct Query or Import mode), but don’t want to rely on a specific user’s credentials (because we all want to go on vacation at some point), you will need to use a service principal for authentication.

Read on for the step-by-step instructions on how to do this.

Comments closed

Managing Database Test Data

Published 2023-07-18 by Kevin Feasel

Phil Factor maintains some tests:

When learning about relational databases, we all tend to use ‘toy’ databases such as Pubs, AdventureWorks, NorthWind, or ClassicModels. This is fine, but it is too easy to assume that one can then do real-world database development in the same way. You have your database full of data and just cut code that you then test. From a distance, it all seems so easy.

In fact, rapid and effective database development usually requires a much more active approach to data. You need to work out how to test your work as you go, and to test continuously. For that, you need appropriate data with the right characteristics, in the suitable quantity. You also need to plan how to ensure that, when you make changes to the database, or even minor changes to its settings, all business processes continue to work correctly. In Agile terms you need a test-first methodology, fast feedback loop, and iterative development. You should never cut some SQL Code and only then think to yourself “I wonder how I’ll be able to test this?“.

This is something I’ve historically been pretty lazy about, to my detriment. Phil does an outstanding job of making the case for why generating and working with your own test data (versus live data) is important, as well as categorizing the purposes of this test data and the types of tests you’ll want to have.

Comments closed

Filtering Calculation Items in a Slicer

Published 2023-07-18 by Kevin Feasel

Marco Russo and Alberto Ferrari do some slicing and filtering:

Slicers with too many values might be inconvenient for users, as they must search for the desired selection among too many lines. In such cases, a common solution is to build a hierarchy and use slicers with multiple columns inside, or multiple slicers, each with one column. However, this solution works only in structures with a natural hierarchy, like continents and countries. Indeed, each country belongs to only one continent so the hierarchy can be easily created with a new column.

If the hierarchy is non-natural, the relationship between the parent and the children is many-to-many, requiring a specific type of relationship.

Click through to see what that relationship looks like and how you can build it.

Comments closed

Index Maintenance in Azure SQL DB

Published 2023-07-18 by Kevin Feasel

Kendra Little gives an answer:

Have you ever received advice that was technically correct, but which was delivered in such a way that it was too hard to understand?

I think of this as “accidental bad advice,” because it leads to confusion. There’s a LOT of accidental bad advice out there on index maintenance for SQL Server and cloud versions like Azure SQL, even in the official documentation.

In this post I’m answering a common index maintenance question, and we’re going to keep it simple.

The answer is essentially the same as it would be on-premises: yes, but perform index maintenance when it is appropriate. Read on to learn what that means in this case.

Comments closed

Modularizing an Existing Shiny App

Published 2023-07-17 by Kevin Feasel

Peter Baranovskiy breaks it down:

There are multiple tutorials available online on writing modular Shiny apps. So why one more? Well, when I just started with building modular apps myself, these didn’t do much for me. So I really only learned how to write modules when I had an opportunity to team up with an experienced R Shiny developer. The reason I guess is that Shiny modules is an advanced topic, and you typically get to writing modules only when you finally need to scale your apps – and keep opportunities for further scaling open. This typically means when your app goes into production. By then you probably have already developed multiple apps, and switching over to a way of thinking required to write modules may be challenging. If you don’t know what modules are, I recommend starting here and then coming back to this post. Otherwise, read on.

So, I decided to try a different approach and instead of building a simple modular app from scratch, to go in the opposite direction by breaking down a complex real-life app into modules. Here’s the app’s original non-modular code. Note a single app.R file that contains the entire app. static_assets.R includes some object definitions which I moved to a separate file for convenience. calgary_crime_data_prep.R is not part of the app; it is a data retrieval and cleaning script executed once a month with cron. Running the script each time the app launches would have made it extremely slow and would use way too much bandwidth, as the script downloads and processes 150+ Mb of data on each run.

Read on for the reasoning behind using modules, as well as Peter’s notes on the process.

Comments closed

Bug in fn_xe_file_target_read_file

Published 2023-07-17 by Kevin Feasel

Erik Darling notes a bug:

SQL Server has had the fn_xe_file_target_read_file function for a while, but starting with SQL Server 2017, a column called timestamp_utc was added to the output.

Somewhat generally, it would be easier to filter event data out using this column… if it worked correctly. The alternative is to interrogate the underlying extended event XML timestamp data.

That’s… not fun.

Erik shows us the problem and also provides a workaround, as well as the Microsoft Feedback issue you can vote on to get this done sooner.

Comments closed

An Overview of Semantic Modeling in Microsoft Fabric

Published 2023-07-17 by Kevin Feasel

Teo Lachev talks semantic modeling:

In retrospect, I’d say I owe 50% of my BI career to Analysis Services and its flavors: Multidimensional, Tabular, and later Power BI. This is why I closely follow how this technology evolves. Fast forwarding to Fabric, there are no dramatic changes. Unlike the other two Fabric Engines (Lakehouse and Warehouse), Power BI datasets haven’t embraced the delta lake file format to store its data yet. The most significant change is the introduction of a new Direct Lake data access mode alongside the existing Import and DirectQuery.

Read on for Teo’s thoughts. I think there’s a good chance that the Bad/Ugly points will be eliminated by the time Fabric goes GA, though we’ll have to wait and see if that’s the case.

Comments closed

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

Curated SQL Posts