Press "Enter" to skip to content

Curated SQL Posts

Common SQL Server Mistakes: Default Auto-Growth

Hemantgiri Goswami takes a look at auto-growth:

Auto Growth is a feature that allows database files (primary, secondary, and log) to expand when the database file becomes full – without manual intervention.

Auto Growth feature is handy when we do not want to increase the size of database files manually. There are two ways you can set auto growth – using SQL Server Management Studio (SSMS hereafter) and T-SQL. Auto Growth can be configured – In Percent and Megabytes.

Auto-growth isn’t a problem on its own, though growth sizes, especially in older versions of SQL Server, were far too low for medium- and large-sized databases.

I don’t particularly like the 2.5 MB example Hemantgiri shows. I have a quick rule of thumb which is 64MB for small databases, 256-512 for medium-sized databases, and 1GB for large databases (assuming my underlying disk is fast). This limits the number of auto-growth events and, for log files in particular, keeps virtual log file counts more reasonable.

Comments closed

Package Management in Python

Georgia Atkinson wraps things up with a bow:

Python is a general purpose, high level language which, thanks to its simplicity and versatility, has become very popular, especially within the data science community. The extensive Python community has developed and contributed thousands of libraries and packages over the years in a plethora of different disciplines to aid developers with their applications. Managing these packages can be a challenging task without the correct tools. That’s where Python package managers come in. In this blog post we will explore what a package manager is and why they are important. We will then cover some popular examples, including how to use them, how to install them and the pros and cons of each.

Whilst we will briefly touch on virtual environments in places, we will explore these in more depth in an upcoming post.

Read on for a primer on three options, including how they compare to one another for CI/CD purposes.

Comments closed

Power BI Themes and Gallery

Seth Bauer has an announcement:

The Power BI Tips Theme Generator tool already allows you to easily interact with, and adjust, all the visual properties, wireframes, etc… How could we possibly make Power BI Theme building an effortless experience? We start with building it all for you, then letting you adjust it!
The all new Gallery feature represents a significant leap forward in simplifying the theming process for all. This feature is especially for the business users! But, it also opens up exciting opportunities for the Power BI community to contribute in the future as well.

Read on to see how this works.

Comments closed

Vacuuming in PostgreSQL

Muhammad Ali keeps things tidy:

If you’re a PostgreSQL user, you’ve undoubtedly come across the term “vacuum“. This operation plays a pivotal role in maintaining the optimal performance of your database while preventing unnecessary data bloat. In this blog, we’ll understand how vacuum works on high level, its significance, types, server parameters that influence autovacuum operations, and general FAQ’s on vacuum.

Read on to learn more about what vacuuming does and why it is important. It also turns out that there are multiple types of vacuuming.

Comments closed

SortByColumn Set to Invalid Column ID in SSAS Tabular

Olivier Van Steenlandt troubleshoots an error:

After making these changes, we pushed our changes into Azure DevOps and our deployment pipeline started to deploy the changes to the requested environment.

While the deployment process was executing, it stopped and failed promptly. We ran into an issue: “SortByColumn property set to an invalid column ID”

Read on to see the ultimate cause of and solution to the problem.

Comments closed

In-Memory OLTP and Memory Allocation

Tanayankar Chakraborty explains an error:

We recently encountered a support case where a customer using In-memory tables in an Azure SQL DB, receives an error message while trying to insert data into the table that also has a clustered columnstore index. The customer then deleted the entire data from the In-memory Tables (With the clustered columnstore index), however it appeared that the Index Unused memory was still not released. Here’s the memory allocation the customer could see:

Error

In addition to the error above- here is the error text:

Msg 41823, Level 16, State 109, Line 1

Could not perform the operation because the database has reached its quota for in-memory tables. This error may be transient. Please retry the operation. See ‘http://go.microsoft.com/fwlink/?LinkID=623028‘ for more information

In this case, the error ends up being a “didn’t read the manual” type of error.

Comments closed

Workaround for Primary Keys in Fabric Data Warehouses

Gilbert Quevauvilliers needs a key:

When I started looking into using the data warehouses feature in Fabric, I did see that there were limitations on Primary Key columns.

Below is my blog post on how I still use keys in my data warehouse, instead of using GUID’s which to me are long and hard to use.

In my example I am going to create a simple data warehouse which is going to consist of two-dimension tables (Date and Country) and a fact table with the Sales amounts.

This seems sub-optimal, though at least Gilbert shows us a workaround.

Comments closed

Radar Charts in R

Steven Sanderson has radar love:

Radar charts, also known as spider, web, polar, or star plots, are a useful way to visualize multivariate data. In R, we can create radar charts using the fmsb library. Here are several examples of how to create radar charts in R using the fmsb library:

Radar charts are a guilty pleasure of mine. They are rarely the right choice, but when they are, I love it so much.

Comments closed

Debugging an Unresponsive Elasticsearch Cluster

Derric Gilling troubleshoots an Elasticsearch cluster:

Because of this sharding, a read or write request to an Elasticsearch cluster requires coordinating between multiple nodes as there is no “global view” of your data on a single server. While this makes Elasticsearch highly scalable, it also makes it much more complex to setup and tune than other popular databases like MongoDB or PostgresSQL, which can run on a single server.

When reliability issues come up, firefighting can be stressful if your Elasticsearch setup is buggy or unstable. Your incident could be impacting customers which could negatively impact revenue and your business reputation. Fast remediation steps are important, yet spending a large amount of time researching solutions online during an incident or outage is not a luxury most engineers have. This guide is intended to be a cheat sheet for common issues that engineers running that can cause issues with Elasticsearch and what to look for.

Read on for several helpful tips.

Comments closed

RCSI and ID-Driven ETL

Michael J. Swart shares a warning:

Yesterday, Kendra Little talked a bit about Lost Updates under RCSI. It’s a minor issue that can pop up after turning on RCSI as the default behavior for the Read Committed isolation level. But she doesn’t want to dissuade you from considering the option and I agree with that advice.

In fact, even though we turned RCSI on years ago, by a bizarre coincidence, we only came across our first RCSI-related issue very recently. But it wasn’t update related. Instead, it has to do with an ETL process. To explain it better, consider this demo:

Michael has one example solution. I could also see a “windback” run, where, instead of starting at the very end of the line for ETL, you start a few hundred rows earlier. That way, you can pick up any stragglers. It would add some overhead to the ETL task, but given that ETL jobs should be idempotent, it shouldn’t affect the end results.

Comments closed