Press "Enter" to skip to content

Curated SQL Posts

Managing Powershell Functions with PSFunctionInfo

Jeffrey Hicks announces a new tool:

Over the last year, I’ve been working on a solution. I’ve been using it and finding it helpful. My friend Gladys Kravitz was also bemoaning the lack of tools for managing stand-alone functions. And while she had her own approach, I thought my solution might offer more. So I polished it up, setup a Github repository, and published a preview release to the PowerShell Gallery. The module is called PSFunctionInfo. You can find the repository on Github. Because it is a pre-release, you might need to install the newest version of the PowerShellGet module so you have the prerelease parameters.

Click through for more detail on how to use it.

Comments closed

Aligning Kubernetes Nodes to Physical Infrastructure

Frank Denneman has some advice for us:

With the new VM service and the customizable VM classes, you can help the developer align their nodes to the infrastructure. Infrastructure details are not always visible at the Kubernetes layers, and maybe not all developers are keen to learn about the intricacies of your environment. The VM service allows you to publish only the VM classes you see fit for that particular application project. One of the reasons could be the avoidance of monster-VM deployment. Before this update, developers could have deployed a six worker node Kubernetes cluster using the guaranteed 8XLarge class (each worker node equipped with 32 vCPUs, 128Gi all reserved), granted if your hosts config is sufficient. But the restriction is only one angle to this situation. Long-lived relationships are typically symbiotic of nature, and powerplays typically don’t help build relationships between developers and the InfraOps team. What would be better is to align it with the NUMA configuration of the ESXi hosts within the cluster.

Click through for more detail. This is aimed particularly at operations people running Kubernetes clusters over VMware.

Comments closed

Recent Apache NiFi Updates

Pierre Villard has some news for us around Apache NiFi:

Cloudera released a lot of things around Apache NiFi recently! We just released Cloudera Flow Management (CFM) 2.1.1 that provides Apache NiFi on top of Cloudera Data Platform (CDP) 7.1.6. This major release provides the latest and greatest of Apache NiFi as it includes Apache NiFi 1.13.2 and additional improvements, bug fixes, components, etc. Cloudera also released CDP 7.2.9 on all three major cloud platforms, and it also brings Flow Management on DataHub with Apache NiFi 1.13.2 and more.  Let’s have a look at the main highlights of these releases.

Click through to see what’s included.

Comments closed

Writing SQL to Query R data.frames

Tomaz Kastrun tries out a package:

There are many R packages for querying SQL Databases. Recently, I was looking into sqldf package | CRAN documentation.

There are so many great advantages (simple running SQL statements, creating, loading, deleteing data to data.frames, connectivity to many databases, support for SQL functions, data types and many many more) , but one that was really a major win was interactions with data frames and SQL Language.

Between sqldf and dbplyr, you get it both ways: treat a data.frame like a SQL table, or treat a SQL database like R data.frames.

Comments closed

Row Goals and Query Optimizer Estimates

Hugo Kornelis explains how row goals can change query optimizer behavior:

The most simple is when the query literally tells SQL Server that you don’t want to have all rows returned. Everyone knows the TOP clause, which is most commonly used for this. For ANSI portability, and because it adds a few options, you should also be aware of the FETCH and OFFSET modifiers to the ORDER BY clause, that have a similar functionality and are specifically designed to support paging. And there is of course the SET ROWCOUNT option, though I sincerely hope nobody actually uses that. All of these options literally tell SQL Server that we don’t want all results, only a part of them. The execution plan that would produce the entire set the fastest might not necessarily be the fastest way to get the few rows we actually want, so it’s a good thing that the optimizer has a way to come up with a different execution plan for these cases.

But there are plenty of other ways you might get a row goal, so check them out.

Comments closed

An Introduction to Latches

Paul Randal starts a series on latches:

In some of my previous articles here on performance tuning, I’ve discussed multiple wait types and how they are indicative of various resource bottlenecks. I’m starting a new series on scenarios where a synchronization mechanism called a latch is a performance bottleneck, and specifically non-page latches. In this initial post I’m going to explain why latches are required, what they actually are, and how they can be a bottleneck.

Read on to learn what a latch is, why it is useful, and how latches work at a high level.

Comments closed

Turning On and Off Calculations in Power BI Visuals

Phil Seamark has a clever workaround:

Power BI doesn’t yet have a feature that allows end-users to turn on/off the ability to process calculations for visuals on a report page until they are ready. Most of the time, this is perfectly fine – however, in some instances, it can be handy to disable long-running and heavy calculations from running. At the same time filters/slicers are get selected.

The scenario you most likely want to have this control is when your model uses Direct Query mode against large tables in data sources that charge you for query processing. Even if your Direct Query data source does not charge per query, having a user make quick-fire selections over several slicers can potentially saturate a back-end data-source and unnecessarily chew up resources.

Read on to understand how to use calculation groups to do this, as well as the limitations around this solution.

Comments closed