Press "Enter" to skip to content

Author: Kevin Feasel

A New Powershell Module for SQL Server Security

Stuart Moore introduces dbaSecurityScan:

How easy it to audit them? If someone asks you the DBA exactly who has access to object A, can you tell them? How do people get access to that object, is it via a role, a schema or an explicit permission?

Is that information in an easy to read or manipulate manner?

How do you ensure that permissions persist between upgrades? I’ve certainly seen 3rd party upgrades that have reset database level permissions. Do you have a mechanism to check every permission and put them back as they were?

We’re all doing the devops these days. Our database schema is source controlled, and we’re deploying it incrementally in pipelines and testing it. But are we doing that with our database security?

So in the classic open source way, I decided to scratch my own itch by writing something. That something is dbaSecurityScan, a PowerShell module that aims to offer a solution for all of the above.

Click through to see what dbaSecurityScan covers today, how to call it, and what you can do to get more info.

Comments closed

Using Jupyter Notebooks in SQL Agent Jobs

Rob Sewell shows us how to run an Azure Data Studio notebook as a SQL Agent job:

Azure Data Studio is a great tool for connecting with your data platform whether it is in Azure or on your hardware. Jupyter Notebooks are fantastic, you can have words, pictures, code and code results all saved in one document.

I have created a repository in my Github https://beard.media/Notebooks where I have stored a number of Jupyter notebooks both for Azure Data Studio and the new .NET interactive notebooks.

Another thing that you can do with notebooks is run them as Agent Jobs and save the results of the run.

Read on to learn how.

Comments closed

Per-Query Wait Stats with Extended Events

Grant Fritchey shows us how to see the waits associated with a specific query:

And that my friends is only the waits associated with the one query. TA-DA indeed!

Now, we could get into filtering this stuff too. Toss the ones that have little to no duration, ensure that I only capture for a specific query or procedure, all would be helpful. However, this is how you can easily identify just the waits associated with a single query, and only that query.

Click through to see how. The one thing I’d caution here is that the query which received waits isn’t necessarily the query in the wrong—it might be the fourth or fifth session in a blocking chain. But this is a great technique for getting additional per-query info when you can control the experiment.

Comments closed

Replaying Workloads to a Different Database with WorkloadTools

Gianluca Sartori takes us through workload replay ability in WorkloadTools:

One of the features I was asked to implement for WorkloadTools is the ability to replay commands to a database name different from the one recorded in the source workload.

This is something that I had been planning to implement for a while and it totally makes sense. Usually, you have two identical environments for the workload capture and replay, both with the same databases. Sometimes it makes sense to have two different databases as the source and target for the workload, for some particular reasons: resources constraints, ease of testing and so on.

WorkloadTools now supports replaying commands to a different database, using the DatabaseMap property of the ReplayConsumer.

Setting this up is pretty simple, though Gianluca does lay out a caveat.

Comments closed

Parameterization and Enumerable.Contains() with EF Core 3

Erik Ejlskov Jensen explains how we can prevent Entity Framework Core 3 from polluting the plan cache if we use Enumerable.Contains():

One of the many advantages of using a tool like Entity Framework Core is, that you are sure that the framework will generate properly parameterized SQL for you. This helps avoid SQL injection issues and avoids plan cache pollution. Unfortunately, EF Core currently falls short on that promise, when translating queries, where you supply a list of values to be matched against a column – Enumerable.Contains method – this is translated to a SQL Server IN operator

Click through for a setup of the problem as well as the solution.

Comments closed

Installing Apache Airflow

Achilleus walks us through a process to install Apache Airflow on a machine:

Airflow is an amazing tool by Airbnb and is a kinda defacto standard of ETL deployments in the Data Engineering domain nowadays. But at the same time, you can also use Airflow to schedule to ML pipeline and automate the whole ML pipeline(almost).

This is my attempt to install and set up a fairly robust Apache Airflow deployment for my needs. I am pretty sure there might be some better ways of doing it or add any enhancements to it. Any comments or suggestions are highly appreciated!

This is an easy-to-follow set of steps, so check it out.

Comments closed

Tuning Random Forest HyperParameters with R

Julia Silge gives us an idea of how to tune random forest hyperparameters in R:

Our modeling goal here is to predict the legal status of the trees in San Francisco in the #TidyTuesday dataset. This isn’t this week’s dataset, but it’s one I have been wanting to return to. Because it seems almost wrong not to, we’ll be using a random forest model! 🌳

Let’s build a model to predict which trees are maintained by the San Francisco Department of Public Works and which are not. We can use parse_number() to get a rough estimate of the size of the plot from the plot_size column. Instead of trying any imputation, we will just keep observations with no NA values.

Click through to some data exploration, the initial model, and a process for using Grid Search with the caret package.

Comments closed

Using the Tune Package in R for Hyperparamter Optimization

Abderrahim Lyoubi-Idrissi takes us through a Bayesian approach to tune hyperparameters:

In contrast to the model parameters, which are discovered by the learning algorithm of the ML model, the so called Hyperparameter(HP) are not learned during the modeling process, but specified prior to training.

Hyperparameter tuning is the task of finding optimal hyperparameter(s) for a learning algorithm for a specific data set and at the end of the day to improve the model performance.

Abderrahim contrasts two different methods here: Grid Search and Bayesian Optimization. Definitely an interesting read if you develop data science models.

Comments closed

Removing Old Backups from Azure Blob Storage

Niko Neugebauer has some advice on how to clean up backups which live in Azure Blob Storage:

Continuing the topic of the Backups to Azure Blob Storage that I have kind of kicked off with the post Striping Backups to Azure Blob Storage, I want to touch on the important aspect of “keeping it clean” – thus deleting the old backups.
On the regular Windows Server this is a rather easy task, and if you are using a standard maintenance solution, such as Ola Hallengren’s Maintenance Solution or any other one. You can also use the regular SSMS maintenance (*cough* for whatever reason that is unknown to me, that you might wish to *cough*), or you can easily set up a regular Windows Scheduler with Command Line Batch or Powershell or whatever tool/script/language you like.

The situation is quite different with the Backup To URL functionality, the one that is available since more than 6 years (and the good old SQL Server 2012 has even got a support for it in a certain Cumulative Update – SQL Server 2012 Service Pack 1 CU 2, to be more prices)

Niko goes through five different methods you can use, so check it out.

Comments closed

Avoid Default String Lengths

Kenneth Fisher warns us against creating strings without specifying a length:

Every now and again I see someone get lazy and declare a string (CHARNCHARVARCHAR and NVARCHAR) without specifically declaring what the length is going to be. This can lead to some interesting problems. First of all it’s usually going to be a length of one.

I’d take this one step further and say avoid creating strings without specifying a length in products like ADO.NET as well—there, you won’t get the single-character length Kenneth mentions here, but you do get one plan per character length passed in, which does a great job of bloating the plan cache.

Comments closed