Press "Enter" to skip to content

Curated SQL Posts

Table Design in R with mmtable2

Matt Dancho walks through a package to make tables look great in R:

I love ggplot2 for plotting. The grammar of graphics allows us to add elements to plots. Tables seem to be forgotten in terms of an intuitive grammar with tidy data philosophy – Until now. mmtable2 aims to be the ggplot2 for tables, leveraging the awesome GT table package.

The mmtable2 package aims to make it easy to create tables by:

1. Using a ggplot2-style syntax for using a grammar of table operations.

2. Extends the amazing GT table package.

Read on for the process and a demonstration.

Comments closed

Surviving a Kafka Outage

Jakub Korab walks us through availability features in Kafka as well as what to expect if your brokers are unavailable:

In the case of an outage, you have to ensure that these messages can be processed eventually. Keeping unsent messages around and retrying indefinitely in the hopes that the outage will rectify may eventually result in your application running out of memory. This is a crucial consideration in high-throughput applications.

If business functions are performed by systems downstream of Kafka, and the sending application only acts as an ingestion point, the situation is slightly more relaxed. If Kafka is unavailable to send messages to, then no external activity has taken place. For these systems, a Kafka outage might mean that you do not accept new transactions. In such a case, it may be reasonable to return an error message and allow the external third party to retry later. Retail applications typically fall into this category.

Read the whole thing.

Comments closed

Uncommenting XML from C#

Joy George Kunjikkur needs to remove some XML comment tags:

Requirement 

As part of the installation, some XML fragments (eg: <authentication>) need to be uncommented in web.config file based on the environment,. This can be done either via PowerShell or C#.Net as this has to be triggered from MSI installation. Never during the runtime of the application.

Alternatives

We can either do string-based detection and replace it. Or use XML parser of .Net. Since the string parser is complex, let us stick with the .Net library to replace it.

Read on for one way to do this.

Comments closed

Using a Date Template in Power BI

Haroon Ashraf recommends using a template with date dimension details:

A Power BI Template

A Power BI template is a structure or model that typically contains commonly used tables, relationships, and hierarchies belonging to an organization or an individual. This model is reused in any Power BI report. More information is provided in the previous article:

Centralized Data Modelling using Power BI Templates

What is a Date Template in Power BI?

A Date template is a precise structure of the Date table that is a background for building reports in the organization. In other words, it is like a built-in Date table that any reports developer or a skilled business user can apply to build Power BI reports.

Read on for more Q&A as well as how to create a simple version of a date table for this template. The idea of using a template makes even more sense as you have more complicated date table requirements, such as adding in fiscal year details, holiday information (especially holidays which don’t always fall on the same solar calendar day, such as Passover or Easter), and dates important to the company.

Comments closed

Defining an Ad Hoc Query

Kathi Kellenberger explains what it means to be an ad hoc query:

Someone recently asked me which queries are ad hoc in SQL Server. An ad hoc query is a single query not included in a stored procedure and not parameterized or prepared. Depending on the server settings, SQL Server can parameterize some statements initially written as ad hoc queries. Ad hoc doesn’t mean dynamic.

Next on the list, a post hoc ergo propter hoc query. That’s where I explain to the DBAs that just because the server goes down every time I run a query, it doesn’t mean my queries caused this.

Comments closed

When PyODBC fast_executemany Isn’t

Jon Morisi troubleshoots a performance issue:

I recently had a project in which I needed to transfer a 60 GB SQLite database to SQL Server.  After some research I found the sqlite3 and pyodbc modules, and set about scripting connections and insert statements.  

The basic form of my script is to import the modules, setup the database connections, and iterate (via cursor) over the rows of the select statement creating insert statements and executing them.  

The issue here is that this method results in single inserts being sent one at a time yielding less than satisfactory performance.  Inserting 35m+ rows in this fashion takes ~5hrs on my system.

Jon tries out a few different options. It would appear that there is no easy bulk insertion operation with PyODBC.

Comments closed

Availability Groups and Logins

Andrea Allred runs into a post-failover issue:

While doing a planned Availability Group failover, the application stopped talking to the database. After checking the SQL Server log, we found that all the SQL Logins were failing with an “incorrect password” error. The logins were on the server, the users were in the databases, and the passwords were even right, so what was wrong? It all comes down to SID’s (Security Identifiers).

Read on for the cause and the solution. I’d also recommend Sync-DbaAvailabilityGroup as a good dbatools cmdlet to use.

Comments closed

Managing Powershell Functions with PSFunctionInfo

Jeffrey Hicks announces a new tool:

Over the last year, I’ve been working on a solution. I’ve been using it and finding it helpful. My friend Gladys Kravitz was also bemoaning the lack of tools for managing stand-alone functions. And while she had her own approach, I thought my solution might offer more. So I polished it up, setup a Github repository, and published a preview release to the PowerShell Gallery. The module is called PSFunctionInfo. You can find the repository on Github. Because it is a pre-release, you might need to install the newest version of the PowerShellGet module so you have the prerelease parameters.

Click through for more detail on how to use it.

Comments closed

Aligning Kubernetes Nodes to Physical Infrastructure

Frank Denneman has some advice for us:

With the new VM service and the customizable VM classes, you can help the developer align their nodes to the infrastructure. Infrastructure details are not always visible at the Kubernetes layers, and maybe not all developers are keen to learn about the intricacies of your environment. The VM service allows you to publish only the VM classes you see fit for that particular application project. One of the reasons could be the avoidance of monster-VM deployment. Before this update, developers could have deployed a six worker node Kubernetes cluster using the guaranteed 8XLarge class (each worker node equipped with 32 vCPUs, 128Gi all reserved), granted if your hosts config is sufficient. But the restriction is only one angle to this situation. Long-lived relationships are typically symbiotic of nature, and powerplays typically don’t help build relationships between developers and the InfraOps team. What would be better is to align it with the NUMA configuration of the ESXi hosts within the cluster.

Click through for more detail. This is aimed particularly at operations people running Kubernetes clusters over VMware.

Comments closed