Curated SQL – Page 687 – A Fine Slice Of SQL Server

Doodles about the Storage Engine

Published 2021-05-06 by Kevin Feasel

Paul Randal is a SQL Server legend with loads of informative articles. But when I was a baby DBA first reading Inside the Storage Engine, I got a little stuck. It took many passes before, eventually, finally, it clicked. I wish I had a lightweight introduction, so in the practice of paying it forward…
Here’s the starting point: sometimes it’s easier to manage lots of small things (say, the 1s and 0s of data) by grouping them into larger things. It’s the same reason you don’t buy rice by the grain.

Read on for that introduction to the storage engine.

Comments closed

Tips for Improving Code Performance in R

Published 2021-05-05 by Kevin Feasel

Mira Celine Klein continues a series on code performance in R:

This is the second part of our series about code performance in R. It contains a lot of approaches to reduce the time your code needs to run. It’s useful to know those ideas before starting to write new code, but it also helps to optimize existing code.
If you have already written some code you want to speed up, but don’t know which part of it is actually slow, I recommend you to read the first part of this series on profiling. That article also introduces the microbenchmark package which we are going to use to measure code performance in this article.
Let’s start with a seemingly obvious rule, which is however not always easy to follow.

Read on for some tips. H/T R-bloggers.

Comments closed

Table Design in R with mmtable2

Published 2021-05-05 by Kevin Feasel

Matt Dancho walks through a package to make tables look great in R:

I love ggplot2 for plotting. The grammar of graphics allows us to add elements to plots. Tables seem to be forgotten in terms of an intuitive grammar with tidy data philosophy – Until now. mmtable2 aims to be the ggplot2 for tables, leveraging the awesome GT table package.
The mmtable2 package aims to make it easy to create tables by:
1. Using a ggplot2-style syntax for using a grammar of table operations.
2. Extends the amazing GT table package.

Read on for the process and a demonstration.

Comments closed

Surviving a Kafka Outage

Published 2021-05-05 by Kevin Feasel

Jakub Korab walks us through availability features in Kafka as well as what to expect if your brokers are unavailable:

In the case of an outage, you have to ensure that these messages can be processed eventually. Keeping unsent messages around and retrying indefinitely in the hopes that the outage will rectify may eventually result in your application running out of memory. This is a crucial consideration in high-throughput applications.
If business functions are performed by systems downstream of Kafka, and the sending application only acts as an ingestion point, the situation is slightly more relaxed. If Kafka is unavailable to send messages to, then no external activity has taken place. For these systems, a Kafka outage might mean that you do not accept new transactions. In such a case, it may be reasonable to return an error message and allow the external third party to retry later. Retail applications typically fall into this category.

Read the whole thing.

Comments closed

Uncommenting XML from C#

Published 2021-05-05 by Kevin Feasel

Joy George Kunjikkur needs to remove some XML comment tags:

Requirement
As part of the installation, some XML fragments (eg: <authentication>) need to be uncommented in web.config file based on the environment,. This can be done either via PowerShell or C#.Net as this has to be triggered from MSI installation. Never during the runtime of the application.
Alternatives
We can either do string-based detection and replace it. Or use XML parser of .Net. Since the string parser is complex, let us stick with the .Net library to replace it.

Read on for one way to do this.

Comments closed

Using a Date Template in Power BI

Published 2021-05-05 by Kevin Feasel

Haroon Ashraf recommends using a template with date dimension details:

A Power BI Template
A Power BI template is a structure or model that typically contains commonly used tables, relationships, and hierarchies belonging to an organization or an individual. This model is reused in any Power BI report. More information is provided in the previous article:
Centralized Data Modelling using Power BI Templates
What is a Date Template in Power BI?
A Date template is a precise structure of the Date table that is a background for building reports in the organization. In other words, it is like a built-in Date table that any reports developer or a skilled business user can apply to build Power BI reports.

Read on for more Q&A as well as how to create a simple version of a date table for this template. The idea of using a template makes even more sense as you have more complicated date table requirements, such as adding in fiscal year details, holiday information (especially holidays which don’t always fall on the same solar calendar day, such as Passover or Easter), and dates important to the company.

Comments closed

Defining an Ad Hoc Query

Published 2021-05-05 by Kevin Feasel

Kathi Kellenberger explains what it means to be an ad hoc query:

Someone recently asked me which queries are ad hoc in SQL Server. An ad hoc query is a single query not included in a stored procedure and not parameterized or prepared. Depending on the server settings, SQL Server can parameterize some statements initially written as ad hoc queries. Ad hoc doesn’t mean dynamic.

Next on the list, a post hoc ergo propter hoc query. That’s where I explain to the DBAs that just because the server goes down every time I run a query, it doesn’t mean my queries caused this.

Comments closed

When PyODBC fast_executemany Isn’t

Published 2021-05-04 by Kevin Feasel

Jon Morisi troubleshoots a performance issue:

I recently had a project in which I needed to transfer a 60 GB SQLite database to SQL Server. After some research I found the sqlite3 and pyodbc modules, and set about scripting connections and insert statements.
The basic form of my script is to import the modules, setup the database connections, and iterate (via cursor) over the rows of the select statement creating insert statements and executing them.
The issue here is that this method results in single inserts being sent one at a time yielding less than satisfactory performance. Inserting 35m+ rows in this fashion takes ~5hrs on my system.

Jon tries out a few different options. It would appear that there is no easy bulk insertion operation with PyODBC.

Comments closed

Availability Groups and Logins

Published 2021-05-04 by Kevin Feasel

Andrea Allred runs into a post-failover issue:

While doing a planned Availability Group failover, the application stopped talking to the database. After checking the SQL Server log, we found that all the SQL Logins were failing with an “incorrect password” error. The logins were on the server, the users were in the databases, and the passwords were even right, so what was wrong? It all comes down to SID’s (Security Identifiers).

Read on for the cause and the solution. I’d also recommend Sync-DbaAvailabilityGroup as a good dbatools cmdlet to use.

Comments closed

Saying One Thing about Writing Good Queries

Published 2021-05-04 by Kevin Feasel

Brent Ozar polls the audience:

If you could give just one piece of advice about writing good queries, what would it be?
I asked Twitter yesterday because I wanted to make sure I didn’t miss anything in a new training course I’m working on, and the replies were fantastic. Here were some of my favorites:

There are some good answers in there.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Curated SQL Posts