Press "Enter" to skip to content

Curated SQL Posts

Executing Python Code With SQL Server

Chris Hyde has started a series on executing external scripts with SQL Server ML Services:

We’ll start off with a simple step-by-step introduction to the sp_execute_external_script stored procedure.  This is the glue that enables us to integrate SQL Server with the Python engine by sending the output of a T-SQL query over to Python and getting a result set back. .  For example, we could develop a stored procedure to be used as a data set in an SSRS report that returns statistical data produced by a Python library such as SciPy.  In this post we’ll introduce the procedure and how to pass a simple data set into and back out of Python, and we’ll get into manipulating that data in a future post.

If you’d like to follow along with me, you’ll need to make sure that you’re running SQL Server 2017 or later, and that Machine Learning Services has been installed with the Python option checked.  Don’t worry if you installed both R and Python, as they play quite nicely together.  The SQL Server Launchpad service should be running, and you should make sure that the ability to execute the sp_execute_external_script procedure is enabled by running the following code:

Chris’s talks on ML Services (either R or Python) were great and I expect this series to be as well.

Comments closed

Cross-Server Database Restoration With Minion Backup

Jen McCown walks us through how to restore a database on a different server with Minion Backup:

Today we’ll look at configuring a common, repeatable scenario: take the latest backup of MyDB from ProdServer1 and restore it to DevServer1. There are four basic steps to the setup and execution:

  1. Configure Minion Backup and let it run on ProdServer1.Restoring with MB requires at least one full backup taken by MB. (Note that you don’t need Minion Backup on DevServer1 for this scenario.)

  2. Configure restore settings paths. You know how sometimes a restore requires one or more “WITH MOVE” clauses?  Configure this once for your source-target pair, and MB takes care of it from then on.

  3. Configure the restore tuning settings (optional). Oh yes, we tune our backups AND our restores!

  4. Generate and run the restore statements.

It’s a good walkthrough if you’re a Minion Backup user.  If you’re not and you’re not particularly happy with your backup solution, I recommend giving it a try.

Comments closed

Finding The Max Number Of Sequences You Can Have In SQL Server

Jon Shaulis looks into how many sequences you can have on a SQL Server instance:

I thought this was an interesting question, but it makes sense to have some concern about it. If you are using Sequences over identity inserts, this typically means you need more control over how those numbers are handled from creation to archive. So what if you need a lot of different Sequences stored on your server to handle your needs? Will you run out of Sequence objects that you can create?

This information is not intuitively simple to find, this is one of those times where you need to read various articles and connect the dots. Let’s start simple

This is one of those cases where Swart’s 10% Rule comes into play.

Comments closed

What’s Special About stats_id = 1

Wayne Sheffield explains what makes stats_id = 1 special, as well as a relationship between stats_id and index_id in SQL Server:

If you were to look at sys.indexes, you would see that these two indexes use index_id values of 1 and 3. The value 2 is skipped. It’s not because there used to be an index that was deleted after the index_id 3 index was created. It’s simply because of the relationship that index_id = stats_id, and there is already a statistic with stats_id = 2. When creating the index for the primary key, index_id 2 had to be skipped.

Check it out for additional insights.

Comments closed

When To Use SQL, DAX, Or M In Power BI Models

Paul Turley offers up some guidance on when to use which language when building Power BI models:

As a general rule of thumb, in formal SSAS projects built on a relational data mart or data warehouse that is managed by the same project team as the BI data model, I typically recommend that every table in the model import data from a corresponding view or UDF stored and managed in the relational database. Keep in mind that is the way we’ve been designing Microsoft BI projects for several years. Performing simple tasks like renaming columns in the SSAS data model designer was slow and cumbersome. Performing this part of the data prep in T-SQL was much easier than in SSDT. With the recent advent of Power Query in SQL Server Data Tools, there is a good argument to be made for managing those transformations but the tool is still new and frankly I’m still testing the water. Again, keep changes in one place for future maintenance.

Do your absolute best to avoid writing complex SQL query logic that cannot be traced back to the sources. Complicated queries can become a black box – and a Pandora’s box if they aren’t documented, annotated and easy to decipher.

But do read Paul’s closing grafs on the importance of not being hidebound.

Comments closed

Using LIME To Explain Keras Models

Shirin Glander shows us how to use the LIME package to explain image recognition models built from Keras:

The segmentation of an image into superpixels are an important step in generating explanations for image models. It is both important that the segmentation is correct and follows meaningful patterns in the picture, but also that the size/number of superpixels are appropriate. If the important features in the image are chopped into too many segments the permutations will probably damage the picture beyond recognition in almost all cases leading to a poor or failing explanation model. As the size of the object of interest is varying it is impossible to set up hard rules for the number of superpixels to segment into – the larger the object is relative to the size of the image, the fewer superpixels should be generated. Using plot_superpixels it is possible to evaluate the superpixel parameters before starting the time-consuming explanation function.

Fun stuff.  I’m glad that there’s a lot of work going into explaining neural networks rather than hand-waving them off as magic.

Comments closed

Learning R Or Python?

David Smith tackles the age-old question:

If your interests lean more towards traditional statistical analysis and inference as used within industries like manufacturing, finance, and the life sciences, I’d lean towards R. If you’re more interested in machine learning and artificial intelligence applications, I’d lean towards Python. But even that’s not a hard-and-fast rule: R has excellent support for machine learning and deep learning frameworks, and Python is often used for traditional data science applications.

One thing I am quite sure of though: neither Python nor R is inherently better than the other, and arguments on that front are ultimately futile. (Trust me, I’ve been there.) Which is better for any given person depends on a wide variety of factors, and for some, it may even be worthwhile to learn both. Brian Ray recently posted a good overview of the factors that may lead you towards R or Python for data science: their history, the community, performance, third-party support, use cases, and even how to use them together. It’s great food for thought if you’re trying to decide which community to invest in.

Embrace the power of “and.”  The whole R versus Python bit is fun for purposes of arguing with people, but they’re both powerful languages and we’re seeing more and more overlap—for example, the Keras package David mentions runs Python’s TensorFlow under the covers.

Comments closed

Backing Up Azure Data Lake Store Data

Hugo Almeida has some hints for backing up Azure Data Lake Store data using Azure Data Factory:

Our Hadoop HDP IaaS cluster on Azure uses Azure Data Lake Store (ADLS) for data repository and accesses it through an applicational user created on Azure Active Directory (AAD). Check this tutorial if you want to connect your own Hadoop to ADLS.

Our ADLS is getting bigger and we’re working on a backup strategy for it. ADLS provides locally-redundant storage (LRS), however, this does not prevent our application from corrupting data or accidentally deleting it. Since Microsoft hasn’t published a new version of ADLS with a clone feature we had to find a way to backup all the data stored in our data lake.

We’re going to show you How to do a full ADLS backup with Azure Data Factory (ADF). ADF does not preserve permissions. However, our Hadoop client can only access the AzureDataLakeStoreFilesystem (adl) through hive with a “hive” user and we can generate these permissions before the backup.

Read the whole thing if you’re thinking of using Azure Data Lake Store.

Comments closed

Retrieving Statistic Use From Query Plan XML

Lonny Niederstadt shows us how to retrieve stats usage details from SQL Server query plans if trace flag 8666 is enabled:

Years ago someone said “Hey – why not drop auto-created stats, since the stats you need will just get created again and you’ll end up getting rid of those you no longer need.”   That *may* be a reasonable step on some systems.  If the risk of bad plans on first execution of a query needed stats that have been dropped is too high, its a bad deal.  If the potential concurrent cost of auto-creating dropped stats is too high, that’s a bad deal.  What about analyzing query plans over some period of time to see which stats are actually used in those plans?  Then auto-stats which aren’t used in that set of plans could be dropped.

That type of stats analysis could have other uses, too.  Prioritizing stats manual stats updates in regular maintenance comes to mind.  Or, determining what stats to create/update on an Always On Availability Group primary based on secondary activity.  And troubleshooting problem queries or identifying suspicious “watchlist” stats based on highly variable queries/plans they are involved with.

So I created this blog post almost 4 years ago.  And now I’ll plead with you to not use the query there… it’s awful.  If you want to query trace flag 8666 style stats from plan XML, please start from the query in this post instead – its much more well behaved 🙂

Read on for the script.

Comments closed

Alerting On Azure Data Lake Store Data Usage

Jose Lara shows off an interesting feature in Azure Data Lake Store:

The massive scale and capabilities of Azure Data Lake Store are regularly used by companies for big data storage. As the number of files, file types, and folders grow, things get harder to manage and staying compliant becomes a greater challenge for companies. Regulations such as GDPR (General Data Protection Regulation) have heightened requirements for control and supervision of files that contain sensitive data.

In this blog post, I’ll show you how to set up alerts in your Azure Data Lake Store to make managing your data easier. We will create a log analytics query and an alert that monitors a specific path and file type and sends a notification whenever the path or file is created, accessed, modified, or deleted.

Auditing access has historically been tricky, so it’s nice that they were able to get that in.

Comments closed