Curated SQL – Page 846 – A Fine Slice Of SQL Server

Archival on Delete in SQL Server

Published 2021-01-12 by Kevin Feasel

Well, friends, I have good news for you. This is an easy one to implement.
Let’s say that in Stack Overflow land, when a user deletes their account we also delete all their votes. That’s not how it works, but it’s how I’m going to show you how to condense what can normally be a difficult process to isolate into a single operation.

The one gripe I have with this post is that my annoyingly loud keyboard is buckling spring, not Cherry MX Blue, thank-you-very-much.

Comments closed

Azure Data Factory and Source Control

Published 2021-01-12 by Kevin Feasel

Ahmad Yaseen shows how you can save Azure Data Factory pipelines in source control:

To overcome these limitations, Azure Data Factory provides us with the ability to integrate with a GIT repository, such as Azure DevOps or GitHub repository, that helps in tracking and versioning the pipelines changes, and incrementally save the pipeline changes during the development stage, without the need to validate the incomplete pipeline, preventing these changes from being lost in case of any crash or failure. In this case, you will be able to test the pipeline, revert any change that is detected as a bug, and publish the pipeline to the Data Factory when everything is developed and validated successfully.

Click through for the setup instructions.

Comments closed

Query Store and Cross-Database Queries

Published 2021-01-12 by Kevin Feasel

Matthew McGiffen does some research:

When I was writing the script shared in my last post Identify the (Top 20) most expensive queries across your SQL Server using Query Store a question crossed my mind:
Query Store is a configuration that is enabled per database, and the plans and stats for queries executed in that database are stored in the database itself. So what does query store do when a query spans more than one database?

Read on for the answer.

Comments closed

DataFrame Cleaning in Spark

Published 2021-01-11 by Kevin Feasel

Craig Covey has an update to the Spark Starter Guide:

Real-world datasets are hardly ever clean and pristine. They commonly include blanks, nulls, duplicates, errors, malformed text, mismatched data types, and a host of other problems that degrade data quality. No matter how much data one might have, a small amount of high quality data is more beneficial than a large amount of garbage data. All decisions derived from data will be better with higher quality data.
In this section we will introduce some of the methods and techniques that Spark offers for dealing with “dirty data”. The term dirty data means data that needs to be improved so the decisions made from the data will be more accurate. The topic of dirty data and how to deal with it is a very broad topic with a lot of things to consider. This chapter intends to introduce the problem, show Spark techniques, and educate the user on the effects of “fixing” dirty data.

It’s interesting to see what’s available in Spark and how you can take advantage of it.

Comments closed

Smoothing and its Inherent Risks

Published 2021-01-11 by Kevin Feasel

John Mount would like you to take care when using smoothers:

Here is a quick data-scientist / data-analyst question: what is the overall trend or shape in the following noisy data? For our specific example: How do we relate value as a noisy function (or relation) of m? This example arose in producing our tutorial “The Nature of Overfitting”.
One would think this would be safe and easy to asses in R using ggplot2::geom_smooth(), but now we are not so sure.

Here’s a quick summary of my general philosophy: the data are more interesting than a smoothed line. I’m okay putting in a smoothed line to help a reader make sense of a trend, but I wouldn’t want to have a plot with just the smoothed line. Read the whole thing from John to get well beyond my rule of thumb.

Comments closed

Handling “Duplicate” Query String Values with Power Query

Published 2021-01-11 by Kevin Feasel

Chris Webb troubleshoots an issue:

Some time ago I wrote a pair of popular posts about using the Query and RelativePath options of the Web.Contents function in Power Query and why they are important for dataset refresh. I have recently learned something extra about this subject which merits a new post, though: how to handle multiple URL query parameters with the same name.

It’s interesting to see how Power Query handles this, as there’s no defined standard behavior. Some renderers give you just the first item, some just the last, and some (like IIS + .NET) give you back a list of all items when you have a query string like ?param1=x&param1=y&param1=z.

1 Comment

Finding Power BI Premium Per User Users

Published 2021-01-11 by Kevin Feasel

Benni de Jagere does some digging:

The other day, I was chatting with one of my clients about Premium Per User, and I gave them the practical guidance to not build any production level dependencies based on PPU features or workspaces, until some of the unknowns have been cleared up. If there’s end users relying on this for their actual daily job, then I’m calling it a production level dependency. Right now, these are preview features, and this client is not actively monitoring changes in the Power BI Landscape.
Shortly after, I got a message that some of their business users did build actual production reports and dataflows in PPU workspaces. And, they were not sure who in the company actually has access to PPU. And that’s where chase down the rabbit hole began

I imagine that this will get easier over time but right now, it doesn’t seem that simple.

Comments closed

Generating Alerts from Power Automate

Published 2021-01-11 by Kevin Feasel

Ed Hansberry shows how to create a Power Automate alert off of SQL Server data:

I’m going to show you how to do this in Power Automate with just a few steps. Let’s get started. In my example, I am going to return a table when a customer has placed an order where the order quantity will not divide evenly into the case pack. So if they order 100 units and the cases contain 24 each, I want to alert the order entry person to tell them the customer has effectively ordered 4.1667 cases, which isn’t allowed. They will need to order either 96 units or 120 units to get 4 or 5 cases.

Read on to see how.

Comments closed

Deduplicating Missing Index Requests

Published 2021-01-11 by Kevin Feasel

Erik Darling trains us in a skill:

Going back to our queries and our index requests, all the queries have two things in common:
– They filter on OwnerUserId
– They order by Score
There are of course other elements in the where clause to attend to, but our job is to come up with one index that helps all of our queries.

This gets a lot easier if you know the queries. If all you have is the set of index details, you can make a sane guess but the chance of being wrong is much greater.

Comments closed

Collecting Wait Stats Over Time

Published 2021-01-11 by Kevin Feasel

Michael J. Swart has a repo for us:

I find wait stats so useful that I’ve got Paul Randal’s SQL Server Wait Statistics (or please tell me where it hurts…) bookmarked and I visit it frequently.
But that gives the total amount of waits for each wait type accumulated since the server was started. And that isn’t ideal when I’m troubleshooting trouble that started recently. No worries, Paul also has another fantastic post Capturing wait statistics for a period of time.

You can also get this from various monitoring tools, as Michael mentions, but if you don’t have such a tool in place, here’s how you can roll your own.

Comments closed

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31

Curated SQL Posts