Press "Enter" to skip to content

Month: January 2021

Gradient Descent in R

Holger von Jouanne-Diedrich lays out the basics of gradient descent:

Gradient Descent is a mathematical algorithm to optimize functions, i.e. finding their minima or maxima. In Machine Learning it is used to minimize the cost function of many learning algorithms, e.g. artificial neural networks a.k.a. deep learning. The cost function simply is the function that measures how good a set of predictions is compared to the actual values (e.g. in regression problems).

The gradient (technically the negative gradient) is the direction of steepest descent. Just imagine a skier standing on top of a hill: the direction which points into the direction of steepest descent is the gradient!

Click through for an example in R.

Comments closed

The Benefits of Table Partitioning

Brenda Bentz lays out some of the benefits of table partitioning in SQL Server:

Table partitioning in MS SQL Server is an advanced option that breaks a table into logically smaller chunk (partition) and then stores each chuck to a multiple filegroups.  Each chuck can be accessed and maintained separately. Partitioning is not visible to users; it behaves like one logical table to the end user. Partitioning was introduced in SQL 2005 as an Enterprise edition feature but starting with SQL 2016 SP1 it is available on all the editions.

Table partitioning can work to improve performance in specific circumstances, though you’ll definitely want to do a lot of testing before rolling it out. H/T Amanda White.

Comments closed

Writing Calculations on Power BI Real-Time Streaming Datasets

Reza Rad shows how we can write DAX measures against a Power BI streaming dataset:

In Power BI, you can have a dataset with Imported dataDirectQueryLive Connection, or Composite mode. You can build all of those types of Power BI datasets in the Power BI Desktop. However, there is a single type of dataset, which you can only build through the service, called the Streaming dataset.

A streaming dataset is for building reports with real-time response time. For example, if you want to build a Power BI dashboard that shows the room temperature as soon as captured by a temperature sensor. For this type of dataset, you send the data rows using Power BI REST API, which can be called using a custom C# application, or PowerShell scripts, or even from a Power Automate flow process.

Read on to see how.

Comments closed

When Statistics Get Updated

Matthew McGiffen never promised us there would be no math:

Before using values from the statistics, the Optimizer will check to see if the statistics are “stale”, i.e. the modification counter exceeds a given threshold. If it does, SQL will trigger a resampling of the statistics before going on to form an execution plan. This means that the plan will be formed against up to date statistics for the table.

Read on for some experimentation on when stats updates kick in.

1 Comment

Windows Drive Corruption Bug

Ax Sharma reports on a nasty bug in Windows:

In August 2020, October 2020, and finally this week, infosec researcher Jonas L drew attention to an NTFS vulnerability impacting Windows 10 that has not been fixed.

When exploited, this vulnerability can be triggered by a single-line command to instantly corrupt an NTFS-formatted hard drive, with Windows prompting the user to restart their computer to repair the corrupted disk records.

The researcher told BleepingComputer that the flaw became exploitable starting around Windows 10 build 1803, the Windows 10 April 2018 Update, and continues to work in the latest version.

Read the whole thing. It looks like this also affects Windows XP, which makes me figure it might affect Windows 7, 8, and 8.1 too. H/T Mark Wilkinson.

Comments closed

Normalization and Reduced Blocking

Erik Darling points out one of the many benefits of normalizing tables in a database:

Looking at the design, there are two big problems:

1. There are “order” columns that are going to get a lot of inserts and updates
2. You’re going to be storing the same customer information over and over again

The more related, but not independent, data you store in the same table, the harder it becomes to effectively index that table.

My take on this is that the old adage of “Normalize until it hurts; denormalize until it works” hasn’t been operative for the past 15 years, when the SSD era began.

Comments closed

DTUs and vCores in Azure SQL Database

Arun Sirpal gives us one take on whether to use DTUs or vCores for Azure SQL Database:

If you are building database solutions in Azure , using Azure SQL Database then you will know that you have a purchasing option decision to make. That being should you use a vCore model or DTU approach?

Arun prefers the vCore model for solid reasons. I agree with the sentiment when we’re talking about production databases. For small-scale, personal stuff, however, the bottom end of the DTU model is much less expensive.

Comments closed

Grouping Data with Spark

Ed Elliott has two quick examples of grouping data in Spark:

I have been playing around with the new Azure Synapse Analytics, and I realised that this is an excellent opportunity for people to move to Apache Spark. Synapse Analytics ships with .NET for Apache Spark C# support many people will surely try to convert T-SQL code or SSIS code into Apache Spark code. I thought it would be awesome if there were a set of examples of how to do something in T-SQL, then translated into how to do that same thing in Spark SQL and the Spark DataFrame API in C#.

Click through for the first example, GROUP BY.

Comments closed

T-SQL Tuesday 134 Roundup

James McGillivray summarizes the results from T-SQL Tuesday #134:

When I volunteered to host a T-SQL Tuesday, I had a very different topic in mind. However, the incredible events of the last year, and in particular, the immense pressure that my wife faced at work, made me realise how important it is to have ways to take breaks, both mental and physical. And while we were away in December, and we both recharged, I thought it would make a good topic for this event. It was wonderful to see the response from the #sqlfamily to my invitation, and by my count 29 different people contributed to the blog party.

I’ve tried to group posts with similar themes in this summary, and since some posts fall into multiple categories, I may mention a single post more than once. Links on names point to Twitter handles, links on descriptions point to the respective blog posts.

Click through for a rather large roundup.

Comments closed

The Concatenation Operator

Hugo Kornelis explains what the Concatenation operator does:

The Concatenation operator reads and returns all rows from all its inputs, in order, and without modification.

This operator is most commonly used to execute queries that use UNION or UNION ALL. In the former case, other operators are required to remove the duplicates, because Concatenation doesn’t provide that functionality. You may also find the Concatenation operator in queries on partitioned views.

Read on to see the algorithm and lots of details about the operator.

Comments closed