2025-01-10 – Curated SQL

The Importance of Virtual Environments in Python

Published 2025-01-10 by Kevin Feasel

Jack Wallen proselytizes for virtual environments:

When developing with Python, chances are pretty good that you’ll need to install various libraries, dependencies and apps to get your project started. The good news is that (in most cases) those installations are pretty straightforward (thanks to pip and other tools).

Problems can arise, however, if you simply install all of those project requirements on your system. It’s like installing any given application, hoping it won’t cause problems with other applications, your OS or your data. In most cases, it’s safe, but there’s always that one instance where things can quickly go awry.

Read on to see how virtual environments can alleviate many of these pains. It took a while for me to understand exactly why virtual environments are so important, but this is definitely something I recommend doing if you work with Python in any capacity.

Comments closed

Always On Lease Timeout Monitoring

Published 2025-01-10 by Kevin Feasel

Yvonne Vanslageren gives us one more thing to check:

SQL Server Always On Availability Groups are a robust solution for achieving high availability and disaster recovery for SQL Server databases. However, simply configuring them is not enough—you also need a solid monitoring strategy to ensure data integrity and system reliability. One key aspect of this monitoring process is keeping an eye on lease timeouts, which can signal larger issues and help prevent potentially catastrophic split-brain scenarios.

In this post, we’ll walk through the various health checks available for Always On Availability Groups, discuss how lease timeouts work, and explore practical methods for monitoring and troubleshooting.

Read on to learn more about the lease timeout concept, as well as where you can get this information and further recommendations around how to deal with the information.

Comments closed

Azure VM Constrained CPU and SQL Server Licensing

Published 2025-01-10 by Kevin Feasel

Rod Edwards works to save some money:

Our SQL servers have specific loads, patterns and resource requirements. We’ve monitored them, we know what they are inside and out. You know that x number of CPUs, and y GB or RAM means that your DBs happily sweat the hardware, but are sized to perfection to make sure it has just enough resource to complete what it needs to do. The negotiations with your Infra team in command of the VM resources were tough, but you’re not greedy… you appreciate that other VMs may require resource as well…

Now you have to migrate to Azure, and SQL on Azure VM is your only option. Great, loads of different types/flavours of VMs out there, one of them must be in the the goldilocks perfect porridge zone right? Right?

Licensing around this gets a little weird. Basically, Rod is absolutely right that an E64-16, for example, would require 16 cores of SQL Server, and that can save a good bit of scratch. For the OS, however, you’re paying for 64 cores, so keep that in mind while you’re rolling in a bed of the cash you’ve saved on SQL Server licensing.

Comments closed

Windows in Microsoft Fabric Real-Time Analytics

Published 2025-01-10 by Kevin Feasel

Reitse Eskens looks at a view of the data:

When you dig into Real time analytics, windowing has some extra fun stuff happening. A windowing function isn’t just a windowing function but it has something extra to go with it.

For this blog, I’m using the event stream created in an earlier blog.

Click through to see how it works and the types of windows that are available to you.

Comments closed

Creating and Working with an Azure Databricks SQL Warehouse

Published 2025-01-10 by Kevin Feasel

John Miner works a shift in the warehouse:

Many companies are leveraging data lakes to manage both structured and unstructured data. However, not all users are familiar with Python and the PySpark module. How can users with a solid understanding of ANSI SQL be effective in the Databricks environment?

Read on for the answer.

Comments closed

Idle PostgreSQL Transactions and Table Bloat

Published 2025-01-10 by Kevin Feasel

Umair Shahid notes that some tables are feeling a bit bloated:

Yup, you read it right. Idle transactions can cause massive table bloat that the vacuum process may not be able to address. Bloat causes degradation in performance and can keep encroaching disk space with dead tuples.

This blog delves into how idle transactions cause table bloat, why this is problematic, and practical strategies to avoid it.

Read on to understand how this can be and what you can do about it. And do check out the comments for a quick explanation of why connection pooling doesn’t exhibit this same problem.

Comments closed

Dealing with Duplicate Data via ROW_NUMBER()

Published 2025-01-10 by Kevin Feasel

Andy Brownsword removes the duplicates:

Data quality and consistency is key to the services we support and solutions we deliver. A gremlin which can undermine that is duplicate data. Let’s start the new year dealing with duplicate data and having a good clear-out.

For our example we’ll consider an Order Product table which contains an OrderID and ProductID, and the combination of these should be unique. Other fields for the duplicate records may differ so we may want to be selective about which records are removed.

This is where I get on my high horse and complain about laziness in data modeling, a very common problem. This takes nothing away from Andy’s post, which is a good method for solving a problem that has gotten out of hand. But if you know that some combination of attributes is unique, add a unique key constraint or a unique non-clustered index right then and there. Doing so will prevent improper duplicate data from ever being an issue. If you don’t know that some combination of attributes must be unique, discuss this with the business side in a way that makes sense for them. Yes, there’s always the risk that you’ll have a conversation later like, “Oh, it turns out that this really should be unique,” but in most cases, you can easily sort this kind of thing out up-front and save a lot of time and effort later on.

Comments closed

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Day: January 10, 2025

The Importance of Virtual Environments in Python

Always On Lease Timeout Monitoring

Azure VM Constrained CPU and SQL Server Licensing

Windows in Microsoft Fabric Real-Time Analytics

Creating and Working with an Azure Databricks SQL Warehouse

Idle PostgreSQL Transactions and Table Bloat

Dealing with Duplicate Data via ROW_NUMBER()