Press "Enter" to skip to content

Author: Kevin Feasel

Azure Backup for SQL Server VM Pains

John Sterrett has run into a few issues with Azure Backup for SQL Server VMs:

Having the ability to backup new databases automatically is taken for granted. So much, that I noticed that Azure Backup for SQL Server VM’s will not automatically backup new databases for you. That’s right. Make sure you remember to go in and detect and select your new database every time you add a database or you will not be able to recover.

Azure Backup for SQL Server VM’s has an interesting feature called Autoprotect. This should automatically backup all your databases for you. Unfortunately, this does not work. Yes, I double-checked by enabling autoprotect for a VM and I added a new database. The database didn’t get backed up so I had to manually add the database.

Some of these seems like it’s easy enough to fix, so hopefully the product gets better over time.

Comments closed

3D Effects in Power BI

David Eldersveld shows how you can use orthographic projection in Power BI:

The projection from three coordinates to a 2D plane is achieved by adding the following two measures. Be sure to adjust the column references and what-if parameter names at the top to correspond to your own data.

Here’s my “Ortho x” measure. The initial six bold values are what you’d need to adjust to your own data and parameter names.

David lays out a face, which is pretty neat.

Comments closed

Creating Graph Tables in SQL Server

Mala Mahadevan continues a series on graph tables in SQL Server:

I have highlighted in red what SQL Server adds to the table – the two system columns – graph id, which is bigint, and node id, which is nvarchar and stores json, and the unique index to help with queries.

We can also see from constraint type that this table is similar to other relational tables – it can be enabled for replication and can have related delete or update actions defined on it if need be.

This post gives a bit more insight into how graph tables work in SQL Server under the covers.

Comments closed

Batch Mode Normalization

Paul White digs into batch mode normalization and its consequences for performance:

I mentioned in the introduction that not all eight-byte data types can fit in 64 bits. This fact is important because many columnstore and batch mode performance optimizations only work with data 64 bits in size. Aggregate pushdown is one of those things. There are many more performance features (not all documented) that work best (or at all) only when the data fits in 64 bits.

In our specific example, aggregate pushdown is disabled for a columnstore segment when it contains even one data value that does not fit in 64 bits. SQL Server can determine this from the minimum and maximum value metadata associated with each segment without checking all the data. Each segment is evaluated separately.

Paul goes deep into the concept, making this well worth your while.

Comments closed

Migrating to a New Hashing Algorithm

Randolph West has some ideas about how to remove some of the pain of changing your hashing algorithm for passwords:

Let’s assume that you have inherited a database which stores passwords in cleartext in a column called Password. This is not a good scenario. For one thing, it is a temptation to access information we wouldn’t normally have access to, by using that person’s credentials. For another, due to a human trait called “using a computer,” people reuse passwords all the time. I do it myself and I’m writing a post about security (out of 1,089 items in my 1Password vault, 82 items contain reused passwords).

Randolph has a few ideas, but there’s one which I put into place a long time ago: automatic migration. A system was MD5 (it was a while ago) and I changed it to a better password-hashing algorithm. If the existing user record had a binary the length of an MD5 hash, I’d perform the lookup (to ensure that this was the correct user) and then re-hash using my new algorithm, which had a longer binary length. Over time, people migrated to the new system without any issue. Eventually, when the set of remaining users was small enough, we took more active measures. It wasn’t quite as good as the ideal of forcing resets on everyone, but that was deemed as going overboard for our user base.

Comments closed

When to Use Different ML Algorithms

Stefan Franczuk explains the different categories of machine learning algorithms available in Talend:

Clustering is the task of grouping together a set of objects in such a way, that objects in the same group are more similar to each other than to those in other groups. Clustering is really useful for identify separate groups and therefore is used to solve use cases such as “who are my premium customers?”.

Understanding when to use which algorithm is important. You don’t want to build out the world’s best regression if your benefactors are asking for a classifier.

Comments closed

Databricks versus Mapping Data Flows

Helge Rege Gardsvoll contrasts Azure Databricks, Azure Data Factory Mapping Data Flows, and SQL Server Integration Services:

Mapping Data Flows
One of the many data flows from Microsoft these days providing, for the first time, data transformation capabilities within Data Factory. This is not a U-SQL script or Databricks notebook that is orchestrated from Data Factory, but a tool integrated. This means that you can reuse (many of) the datasets you have defined in Data Factory, while in Databricks you don’t.

Mapping Data Flows runs on top of Databricks, but the cluster is handled for you and you don’t have to write any of that Scala code yourself.

Read on for the full comparison.

Comments closed

Develop BDC PySpark Jobs in Visual Studio Code

Jenny Jiang announces a new capability in Visual Studio Code:

With the Visual Studio Code extension, you can enjoy native Python programming experiences such as linting, debugging support, language service, and so on. You can run current linerun selected lines of code, or run all for your PY file. You can import and export a .ipynb notebook and perform a notebook like query including Run Cell, Run Above, or Run Below. You can also enjoy a notebook like interactive experience that includes your source code and markdown comments along with the running results and output. You can remove the unneeded sections, enter comments, or type additional code in the interactive results window. Moreover, you can visualize your results in a graphic format through a matplotlib like Jupyter Notebook. The integration with SQL Server 2019 Big Data Clusters empowers you to quickly submit a PySpark batch job to the big data cluster and monitor job progress.

This is rather useful for developers, though I greatly prefer the Azure Data Studio notebook interface.

Comments closed

Determining Instant File Initialization Status

Dave Mason gives us a couple of methods for determining whether we turned Instant File Initialization on:

Here’s a little tidbit I wanted to share regarding the Perform Volume Maintenance Tasks security setting. In the SQL Server world, this is often referred to as IFI. On more recent versions of SQL (SQL 2012 SP4 or later, I believe), you can verify if IFI is enabled or not for the database engine logon account by checking the error log.

That’s one, but click through for the technique you can easily script out.

Comments closed