Press "Enter" to skip to content

Month: May 2021

When PyODBC fast_executemany Isn’t

Jon Morisi troubleshoots a performance issue:

I recently had a project in which I needed to transfer a 60 GB SQLite database to SQL Server.  After some research I found the sqlite3 and pyodbc modules, and set about scripting connections and insert statements.  

The basic form of my script is to import the modules, setup the database connections, and iterate (via cursor) over the rows of the select statement creating insert statements and executing them.  

The issue here is that this method results in single inserts being sent one at a time yielding less than satisfactory performance.  Inserting 35m+ rows in this fashion takes ~5hrs on my system.

Jon tries out a few different options. It would appear that there is no easy bulk insertion operation with PyODBC.

Comments closed

Availability Groups and Logins

Andrea Allred runs into a post-failover issue:

While doing a planned Availability Group failover, the application stopped talking to the database. After checking the SQL Server log, we found that all the SQL Logins were failing with an “incorrect password” error. The logins were on the server, the users were in the databases, and the passwords were even right, so what was wrong? It all comes down to SID’s (Security Identifiers).

Read on for the cause and the solution. I’d also recommend Sync-DbaAvailabilityGroup as a good dbatools cmdlet to use.

Comments closed

Managing Powershell Functions with PSFunctionInfo

Jeffrey Hicks announces a new tool:

Over the last year, I’ve been working on a solution. I’ve been using it and finding it helpful. My friend Gladys Kravitz was also bemoaning the lack of tools for managing stand-alone functions. And while she had her own approach, I thought my solution might offer more. So I polished it up, setup a Github repository, and published a preview release to the PowerShell Gallery. The module is called PSFunctionInfo. You can find the repository on Github. Because it is a pre-release, you might need to install the newest version of the PowerShellGet module so you have the prerelease parameters.

Click through for more detail on how to use it.

Comments closed

Aligning Kubernetes Nodes to Physical Infrastructure

Frank Denneman has some advice for us:

With the new VM service and the customizable VM classes, you can help the developer align their nodes to the infrastructure. Infrastructure details are not always visible at the Kubernetes layers, and maybe not all developers are keen to learn about the intricacies of your environment. The VM service allows you to publish only the VM classes you see fit for that particular application project. One of the reasons could be the avoidance of monster-VM deployment. Before this update, developers could have deployed a six worker node Kubernetes cluster using the guaranteed 8XLarge class (each worker node equipped with 32 vCPUs, 128Gi all reserved), granted if your hosts config is sufficient. But the restriction is only one angle to this situation. Long-lived relationships are typically symbiotic of nature, and powerplays typically don’t help build relationships between developers and the InfraOps team. What would be better is to align it with the NUMA configuration of the ESXi hosts within the cluster.

Click through for more detail. This is aimed particularly at operations people running Kubernetes clusters over VMware.

Comments closed

Recent Apache NiFi Updates

Pierre Villard has some news for us around Apache NiFi:

Cloudera released a lot of things around Apache NiFi recently! We just released Cloudera Flow Management (CFM) 2.1.1 that provides Apache NiFi on top of Cloudera Data Platform (CDP) 7.1.6. This major release provides the latest and greatest of Apache NiFi as it includes Apache NiFi 1.13.2 and additional improvements, bug fixes, components, etc. Cloudera also released CDP 7.2.9 on all three major cloud platforms, and it also brings Flow Management on DataHub with Apache NiFi 1.13.2 and more.  Let’s have a look at the main highlights of these releases.

Click through to see what’s included.

Comments closed

Writing SQL to Query R data.frames

Tomaz Kastrun tries out a package:

There are many R packages for querying SQL Databases. Recently, I was looking into sqldf package | CRAN documentation.

There are so many great advantages (simple running SQL statements, creating, loading, deleteing data to data.frames, connectivity to many databases, support for SQL functions, data types and many many more) , but one that was really a major win was interactions with data frames and SQL Language.

Between sqldf and dbplyr, you get it both ways: treat a data.frame like a SQL table, or treat a SQL database like R data.frames.

Comments closed

Row Goals and Query Optimizer Estimates

Hugo Kornelis explains how row goals can change query optimizer behavior:

The most simple is when the query literally tells SQL Server that you don’t want to have all rows returned. Everyone knows the TOP clause, which is most commonly used for this. For ANSI portability, and because it adds a few options, you should also be aware of the FETCH and OFFSET modifiers to the ORDER BY clause, that have a similar functionality and are specifically designed to support paging. And there is of course the SET ROWCOUNT option, though I sincerely hope nobody actually uses that. All of these options literally tell SQL Server that we don’t want all results, only a part of them. The execution plan that would produce the entire set the fastest might not necessarily be the fastest way to get the few rows we actually want, so it’s a good thing that the optimizer has a way to come up with a different execution plan for these cases.

But there are plenty of other ways you might get a row goal, so check them out.

Comments closed

An Introduction to Latches

Paul Randal starts a series on latches:

In some of my previous articles here on performance tuning, I’ve discussed multiple wait types and how they are indicative of various resource bottlenecks. I’m starting a new series on scenarios where a synchronization mechanism called a latch is a performance bottleneck, and specifically non-page latches. In this initial post I’m going to explain why latches are required, what they actually are, and how they can be a bottleneck.

Read on to learn what a latch is, why it is useful, and how latches work at a high level.

Comments closed