Press "Enter" to skip to content

Curated SQL Posts

Parallelization With Rcpp

Blazej Moska demonstrates how to use Rcpp to parallelize R code:

One of the frustrating moments while working with data is when you need results urgently, but your dataset is large enough to make it impossible. This happens often when we need to use algorithm with high computational complexity. I will demonstrate it on the example I’ve been working with.

Suppose we have large dataset consisting of association rules. For some reasons we want to slim it down. Whenever two rules consequents are the same and one rule’s antecedent is a subset of second rule’s antecedent, we want to choose the smaller one (probability of obtaining smaller set is bigger than probability of obtaining bigger set).

Read the whole thing.

Comments closed

Database-Scoped Optimize For Ad Hoc Workloads

Joe Sack introduces a new database-scoped configuration option:

SQL Server provides the “optimize for ad hoc workloads” server-scoped option that is used to reduce the memory footprint of single use ad hoc batches and associated plans.  When enabled at the SQL Server instance scope, the “optimize for ad hoc workloads” option stores a reduced-memory compiled plan stub on the first execution of an ad hoc batch for any database on the instance. This server option has been available in SQL Server for several years now, but until recently there hasn’t been a way to enable this option in Azure SQL Database for individual databases.

We are now introducing a new database scoped configuration called OPTIMIZE_FOR_AD_HOC_WORKLOADS which enables this behavior at the database scope in Azure SQL Database.

I’m not sure if this will make it to the on-prem product, and if it does, I’m not sure how useful it would be in practice.  But it is good that we can use it in Azure SQL Database.

Comments closed

VSS Snapshot: Freeze & Thaw

Erik Darling points out that VSS backups aren’t instantaneous and can block queries:

Ah, backups. Why are they so tough to get right?

You start taking them, you find out you’re not taking enough of them, or that they’re not the right kind, or that you’re not using checksums or compression, or that you’re not storing them in the right place, or that the storage isn’t redundant.

It’s just like, why won’t someone make this easy?

Then you read about VSS Snaps, and they look so dead simple. You don’t need your DBA Ph.D to use them.

And look how fast they are! Oh how they blaze.

Read the whole thing.

Comments closed

Azure SQL Analytics

Arun Sirpal gives an introduction to Azure SQL Analytics:

Please see the prerequisites section within this document – YOU MUST do this else you will not be able to use this feature. https://docs.microsoft.com/en-us/azure/log-analytics/log-analytics-azure-sql#prerequisites

Once setup it should take approximately 15 minutes to start capturing and rendering back some data. Don’t be surprised if it does take a little longer as was the case for myself.

My biggest complaint is about the visuals; otherwise, this looks like the beginning of a solid monitoring solution within Azure SQL Database.

Comments closed

Fixing Orphaned Users In SQL Server

Eitan Blumin shares a couple of methods to fix orphaned users in SQL Server:

The most correct solution for this problem, is to have consistent SIDs to your Logins across all your SQL Servers.
So that even when a database is moved to a different server, it could still use the same SID that it was originally created for.
And also, when you recreate a previously deleted Login, you’d need to create it with the same SID that it originally had.

This is, obviously, not a trivial matter, and not always possible.

But if this is a direction that interests you, then you will find the following very useful:

Read on for the best solution, as well as the second-best solution using sp_change_users_login.

Comments closed

Fun With Meltdown And Spectre

Brent Ozar looks at some of the consequences of Meltdown and Spectre for SQL Server DBAs:

That’s because some test results have found big slowdowns when the operating system is patched for Meltdown and/or Spectre. These are big vulnerabilities in the processors themselves, and OS vendors are having to make big changes that aren’t tuned for performance yet. Early benchmarks yesterday were showing 30% drops in PostgreSQL performance, but thankfully newer benchmarks have been showing smaller drops. Red Hat’s benchmarks show 3-7% slower analytics workloads, and 8-12% slower OLTP.

Joey D’Antoni has more:

Will This Impact My Performance?

Probably–especially If you are running on virtual hardware. For workloads on bare metal, the security risk is much lower, so Microsoft is offering a registry option to not include the microcode fixes. Longer term especially if you are audited, or allow application code to run on your database servers, you will need to enable the microcode options.

This will likely get better over time as software patches are released, that are better optimized to make fewer calls. Ultimately, this will need to fixed on the hardware side, and we will need a new generation of hardware to completely solve the security issue with a minimum impact.

Allan Hirt has even more:

There are two bugs which are known as Meltdown and Spectre. The Register has a great summarized writeup here – no need for me to regurgitate. This is a hardware issue – nothing short of new chips will eradicate it. That said, pretty much everyone who has written an OS, hypervisor, or software has (or will have) patches to hopefully eliminate this flaw. This blog post covers physical, virtualized, and cloud-based deployments of Windows, Linux, and SQL Server.

The fact every vendor is dealing with this swiftly is a good thing. The problem? Performance will most likely be impacted. No one knows the extent, especially with SQL Server workloads. You’re going to have to test and reset any expectations/performance SLAs. You’ll need new baselines and benchmarks. There is some irony here that it seems virtualized workloads will most likely take the biggest hit versus ones on physical deployments. Time will tell – no one knows yet.

This will have long-term ramifications.  We’ll deal with them like we’ve dealt with other issues in the past, but it does seem that, at least for now, there will be some performance hit from this.

Comments closed

Clustering The Power BI Gateway

Craig Porteous show how to cluster the Power BI Data Gateway to allow for disaster recovery:

I love PowerShell and I even wrote a module with functions to query Power BI metadata but there should always be another way to get this vital information.

The documentation I mentioned earlier points you to a PowerShell module file included in the November update. You can load this file & use the commands they provide to get information about your Gateway cluster and its members or make changes to clusters.

If it’s important enough to use, it’s important enough to include in a disaster recovery plan.

Comments closed

Finding Maxima And Minima

Jobil Louis shares various techniques for finding a global maximum or minimum:

Let’s say we want to find the minimum point in y and value of x which gives that minimum y. There are many ways to find this. I will explain three of those.

1) Search based methods: Here the idea is to search for the minimum value of y by feeding in different values of x. There are two different ways to do this.

a) Grid search: In grid search, you give a list of values for x(as in a grid) and calculate y and see the minimum of those.

b) Random search: In this method, you randomly generate values of x and compute y and find the minimum among those.

The drawback of search based methods is that there is no guarantee that we will find a local or global minimum. Global minimum means the overall minimum of a curve. Local minimum means a point which is minimum relatively to its neighboring values.

My favorite class of algorithm here is evolutionary algorithms, particularly genetic algorithms and genetic programming.  They’re a last-ditch effort when nothing else works, but the funny thing about them is that when nothing else works, they tend to step up.

Comments closed

Plotting Data With Python In ML Services

Robert Sheldon continues his Python in Machine Learning Services series:

One of the most useful modules is the matplotlib library, which provides an extensive codebase for plotting data and creating rich, customized visualizations. You can use matplotlib components to generate a wide range of graphics, including bar charts, pie charts, scatter plots, histograms, and many others. For example, you can generate a series of line charts that aggregate inventory or sales data in your SQL Server database and then save those charts to .png or .pdf files.

This article includes several examples that demonstrate how to create matplotlib visualizations and save them to .pdf files, using data from the AdventureWorks2017 sample database. The article assumes that you know how to use the sp_execute_external_script stored procedure to run Python scripts in SQL Server. If you’re not familiar with the stored procedure, you should review the first two articles in this series before continuing with this one.

If you’re already familiar with matplotlib, using it within SQL Server is pretty easy, as Robert shows.  If you’re not familiar, this is a useful introduction to the library.

Comments closed

Plotting Graph Data In R

Sifiso Ndlovu shows how to take graph data from SQL Server and plot it in R using Machine Learning Services:

However, with recent focus on big data for many of my clients, we have experienced an increase in different business requests that requires for many-to-many data modelling. Consequently, as a Microsoft shop we’ve had to turn to other non-Microsoft products to ensure that we optimally respond to such business requests. Not surprisingly, ever since word got around that graph database will be part of SQL Server 2017, we’ve been looking forward to this latest release of SQL Server. Having played around with the graph database feature in SQL Server 2017, we have noticed that unlike other graph database vendors, plotting and visualising the data out of the graph database is not readily available in SQL Server 2017. Luckily, thanks to SQL Server R, you can easily plot and visualise SQL Server 2017 graph database data without turning to 3rd party plugins. In this article, I demonstrate how SQL Server Machine Learning Services (previously known as SQL Server 2016 R Services) can be used to plot a diagram according to the data defined in a SQL Server 2017 graph database.

The igraph library is a good one; there’s a lot of power in it that this post just introduces.

Comments closed