Press "Enter" to skip to content

Curated SQL Posts

sample_ms in sys.dm_io_virtual_file_stats and Data Types

Paul Randal points out an interesting bug:

A prior student emailed me yesterday about some strange behavior of the sample_ms column in sys.dm_io_virtual_file_stats. It’s supposed to be the number of milliseconds since the SQL Server instance has been started. He has a SQL Server 2016 instance that’s been running since August 2019, and it shows the following:

ms_ticks from sys.dm_os_sys_info: 51915112684 (which works out to be August 28, 2019)

sample_ms from sys.dm_io_virtual_file_stats: 375504432 (which works out to be about 4.5 days)

Read on to get a determination and an alternative in case you’re using that field.

Comments closed

Pareto Charts in Power BI

Imran Burki builds a chart:

Last week we built a Manufacturing Yield dashboard that showed first and final pass yield numbers. In this post, we’re going to introduce the concept of manufacturing defects and build a Pareto chart in Power BI. Unlike Excel, the Pareto chart isn’t out-of-the-box in Power BI. Instead, we must create DAX to build the Pareto. Before we dig into the DAX, let’s talk about why we would create a Pareto in the context of manufacturing and why defects are important to track.

Read on to learn what a Pareto chart is and how you can build the DAX function which gives us the relevant information.

Comments closed

EMR Studio Now Generally Availabile

Shuang Li announces that Amazon EMR Studio is now in GA:

EMR Studio provides fully managed Jupyter notebooks, and tools like Spark UI and YARN Timeline Service to simplify debugging. EMR Studio uses AWS Single Sign-On and allows you to log in directly with your corporate credentials without signing in to the AWS Management Console. You can install custom kernels and libraries, collaborate with peers using code repositories such as GitHub and Bitbucket, and run parameterized notebooks as part of scheduled workflows using orchestration services like Apache Airflow and Amazon Managed Workflows for Apache Airflow (Amazon MWAA).

With EMR Studio, you can run notebook code on Amazon EMR on Amazon Elastic Compute Cloud (Amazon EC2) or Amazon EMR on Amazon Elastic Kubernetes Service (Amazon EKS), and take advantage of the performance-optimized EMR runtime for Apache Spark. You can set up EMR Studio to run applications on existing EMR clusters or create new clusters using Cloud Formation templates for Amazon EMR.

Click through for more information.

Comments closed

reduceByKey and aggregateByKey in Spark

The Hadoop in Real World team compares two functions against RDDs in Spark:

Let’s examine the below aggregateByKey. The first parameter – 0 is the initial value and also indicates the type of the output.

First _+_  function indicates the function on the map side combine and second _+_ function indicates the reduce side combine. Both functions are the same in this case.

This is a demo-driven post, so check it out.

Comments closed

Offloading Maintenance Operations

Taryn Pratt has a process for offloading maintenance operations onto another server:

Early on when I started working on the SQL Servers at Stack Overflow, we were taking daily backups. We had a handful of databases that were being restored for other processes, but the majority weren’t actively tested to ensure the backups were good. Since you never want to be in a situation where you need to restore a database and find it doesn’t work, my goal was to create a process to automatically restore our backups to a separate server, and then run DBCC CHECKDB on it.

This is a T-SQL-driven process and I appreciate that. If you want a Powershell-driven process, Kevin Hill has you covered.

Comments closed

Unkillable Threads

Paul Randal gives us a supervillain origin story:

While I was teaching IEPTO2 last week, I was discussing why sometimes a thread cannot be terminated using the KILL command, and thought it would make a great topic for a post.

Some of you have likely seen a phenomenon called a non-yielding scheduler. This is where a thread is using the processor and doesn’t voluntarily yield after using more than the thread quantum (4 milliseconds, unchangeable). There’s a background task called the scheduler monitor that checks that progress is being made on the various schedulers inside SQL Server and issues a warning if it finds a problem.

Read on to learn more about how this can happen and what it means for you.

Comments closed

From Azure Analysis Services to Power BI PPU

Gilbert Quevauvilliers teases a new series:

I have been doing a lot of evaluation and investigations for organizations who currently are using Azure Analysis Services (AAS) and looking to see if they can leverage Power BI Premium Per User (PPU)

In this series I am going to cover the following details below, which I completed to see if the migration was not only feasible but should be the new normal.

Looks like it will be an 11-parter, so we have some reading to look forward to.

Comments closed

Additional Common Query Patterns for Joins

Erik Darling continues a series with two more posts. First up is sorting lookups:

Most people see a lookup and think “add a covering index”, regardless of any further details. Then there they go, adding an index with 40 included columns to solve something that isn’t a problem.

You’ve got a bright future in government, kiddo.

In today’s post, we’re going to look at a few different things that might be going on with a lookup on the inside.

The next post is around pre-fetching lookups:

One sort of interesting point about prefetching is that it’s sensitive to parameter sniffing. Perhaps someday we’ll get Adaptive Prefetching.

Until then, let’s marvel at at this underappreciated feature.

Check out both posts and prepare to be illuminated.

Comments closed

Apache Kafka 2.8 Released

John Roesler announces Apache Kafka 2.8:

We are excited to announce that 2.8 introduces an early-access look at Kafka without ZooKeeper! The implementation is not yet feature complete and should not be used in production, but it is possible to start new clusters without ZooKeeper and go through basic produce and consume use cases.

At a high level, KIP-500 works by moving topic metadata and configurations out of ZooKeeper and into a new internal topic named @metadata. This topic is managed by an internal Raft quorum of “controllers” and is replicated to all brokers in the cluster. The leader of the Raft quorum serves the same role as the controller in clusters today. A node in the KIP-500 world can serve as a controller, a broker, or both, depending on the new process.roles configuration. See the README for quickstart instructions and additional details.

In addition to the headline item, there are plenty of other bugfixes and additions as well.

Comments closed

Working with Secrets in Powershell

Jeffrey Hicks tries out the Secrets Management modules in Powershell:

So I’ve been kicking the tires and trying to do more with the Secrets Management modules from Microsoft, now that they are out of pre-release status. You can install the Microsoft.PowerShell.SecretStore and Microsoft.PowerShell.SecretManagement modules, you’ll need both, from the PowerShell Gallery. You can find extension modules that build on the Microsoft modules for working with other key vaults or secret store. Run find-module -tag secretmanagement to find additional modules. But what I want to talk about today relates to the Microsoft modules. Although, it might apply to you with any of the extension modules. The challenge is using the secrets management modules with a PowerShell profile script.

Read on for a challenge around running scheduled tasks which require secrets and a solution.

Comments closed