Press "Enter" to skip to content

Author: Kevin Feasel

The Performance Impact of Dissimilarly-Sized tempdb Files

Chris Taylor puts on the lab coat and safety goggles:

tldr: Over the years I’ve read a lot of blog posts and watched a lot of videos where they mention that you should have your tempdb files all the same size. What I haven’t seen much of (if any) is what performance impact you actually see if they are not configured optimally. This blog post aims to address that

It is not too long, so do read.

Comments closed

Executing sp_configure from Powershell

Jeff Hill shows off another feature of dbatools:

If you’ve been responsible for an instance of SQL Server for any length of time you have probably dealt with sp_configure to change configuration settings at the server level. I have been using SQL Server since v6.5 and it was a thing then too. This is not a post about what the settings are or what they should be set to. There are plenty of resources out there for both. This is about how to see and set these options.

Click through for the process and what you can do with it.

Comments closed

Monitoring Power BI Dataset Refreshes with Log Analytics

Chris Webb continues a series on DicrectQuery over Log Analytics:


In the first post in this series I showed how it was possible to create a DirectQuery dataset connected to Log Analytics so you could analyse Power BI query and refresh activity in near real-time. In this post I’ll take a closer look into how you can use this data to monitor refreshes.

The focus of this series is using DirectQuery on Log Analytics to get up-to-date information (remember there’s already an Import-mode report you can use for long-term analysis of this data), and this influences the design of the dataset and report

Click through for some KQL and explanatory instructions.

Comments closed

Reasons Azure SQL Databases Cannot Move to Serverless

Ahmed Mahmoud troubleshoots an Azure SQL Database migration issue:

We sometimes see customers cannot move their SQL database from provisioned compute tier to serverless while the scaling operation fails with error signature like:

Failed to scale from General Purpose: Gen5, 2 vCores, 32 GB storage, zone redundant disabled to General Purpose: Serverless, Gen5, 2 vCores, 32 GB storage, zone redundant disabled for database: .
Error code: .
Error message: An unexpected error occured while processing the request. Tracking ID: ‘xxxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxx’

Click through for several possible reasons.

Comments closed

Anomaly Detection in Two Ways

Muhammad Asad Iqbal Khan shows how you can use isolation forests and kernel density estimation for outlier detection:

Just like the random forests, isolation forests are built using decision trees. They are implemented in an unsupervised fashion as there are no pre-defined labels. Isolation forests were designed with the idea that anomalies are “few and distinct” data points in a dataset.

Recall that decision trees are built using information criteria such as Gini index or entropy. The obviously different groups are separated at the root of the tree and deeper into the branches, the subtler distinctions are identified. Based on randomly picked characteristics, an isolation forest processes the randomly subsampled data in a tree structure. Samples that reach further into the tree and require more cuts to separate them have a very little probability that they are anomalies. Likewise, samples that are found on the shorter branches of the tree are more likely to be anomalies, since the tree found it simpler to distinguish them from the other data.

Click through for descriptions and the code.

Comments closed

Apache Flink ML 2.0.0

Dong Lin and Yun Gao make an announcement:

The Apache Flink community is excited to announce the release of Flink ML 2.0.0! Flink ML is a library that provides APIs and infrastructure for building stream-batch unified machine learning algorithms, that can be easy-to-use and performant with (near-) real-time latency.

This release involves a major refactor of the earlier Flink ML library and introduces major features that extend the Flink ML API and the iteration runtime, such as supporting stages with multi-input multi-output, graph-based stage composition, and a new stream-batch unified iteration library. Moreover, we added five algorithm implementations in this release, which is the start of a long-term initiative to provide a large number of off-the-shelf algorithms in Flink ML with state-of-the-art performance.

Congratulations to everybody who contributed to the project; it’s a big milestone.

Comments closed

Combining Azure DevOps and Databricks

Anna Wykes continues a series on DevOps for Databricks:

An Environment Variable is a variable stored outside of the Python script; in our instance it will be stored on the DevOps Agent running the DevOps Pipelines. Consequently, it is accessible to other scripts/programs running on the DevOps Agent. We will not cover DevOps Agents in this blog specifically, the simplest description is that they are the compute that runs your pipeline, normally a VM (Virtual Machine) or Docker Container

Read the whole thing.

Comments closed

Mastermind in R

Tomaz Kastrun continues a series of useful games:

Playing a simple guessing game with R. It’s called Mastermind game! This game was originally created for two people, but R version will be for single-player mode, when an R developer or R data scientists need a break.

The gameplay is simple and so are the rules. The board contains 10 rows (or more) with possibilities of four colours and code pegs (white or black). R engine stores a secret colour combination and user selects a random combination.

Click through to see it in action.

Comments closed