Press "Enter" to skip to content

Curated SQL Posts

Loading JSON into a Microsoft Fabric Eventhouse

Christopher Schmidt loads some data:

In the era of big data, efficiently parsing and analyzing JSON data is critical for gaining actionable insights. Leveraging Kusto, a powerful query engine developed by Microsoft, enhances the efficiency of handling JSON data, making it simpler and faster to derive meaningful patterns and trends. Perhaps more importantly, Kusto’s ability to easily parse simple or nested JSON makes it easier then ever to extract meaningful insights from this data. The purpose of this blog post is to walk through ways that JSON data can be loaded into Eventhouse in Microsoft Fabric, where you can then leverage Kusto’s powerful capabilities for this. I’ve tried this a few different ways, and the below approach is the fastest, most efficient low-code way to ingest the data into the Eventhouse. As JSON inherently supports different schemas in a single file, the expectation here is that we have a json file with varying schemas within a single file, and we would like to load this into our Eventhouse for efficient parsing with KQL.

Read on for the process.

Leave a Comment

Interpreting V$ and GV$ Views in Oracle RAC

Kellyn Gorman continues a series on Oracle Real Application Clusters:

Furthering on our Oracle Real Application Clusters (RAC) knowledge, we’re going to go deeper into what we watch for a RAC database that may be different than a single instance.  RAC is built for scale and instance resilience, distributing workloads across multiple nodes.  At the same time, what gives it strength introduces monitoring complexity, especially when you’re not just watching a single instance but multiple, interconnected ones. To manage performance effectively in RAC, you need to understand the difference between V$ and GV$ views, what they show you, and how to interpret cluster-level wait events.  Along with performance, the overall health of the RAC cluster and interconnect must be known, too.

Click through for Kellyn’s explanation.

Leave a Comment

Paste a List of Values into a Power BI Slicer

Dan English doesn’t want to click over and over:

Have you ever wanted to take a list of values from say an Excel spreadsheet and paste those into a Power BI slicer to filter the list? Like say you are only interested in particular set of items, but the list of items is long and filtering through a list of say a thousands values can take a while. I bet you have and this has been an item that has been requested for a very long time going back to 2017!

Well believe it or not, I just found out this week during a meeting with the product team it has been released!! 

Click through for the limitations, as well as a demo of how it works.

Leave a Comment

Optional Parameter Plan Optimization in SQL Server 2025

Brent Ozar is down with OPP(O):

SQL Server 2025 improved PSPO to handle multiple predicates that might have parameter sensitivity, and that’s great! I love it when Microsoft ships a v1 feature, and then gradually iterates over to make it better. Adaptive Memory Grants were a similar investment that got improved over time, and today they’re fantastic.

SQL Server 2025 introduces another feature to mitigate parameter sniffing problems: Optional Parameter Plan Optimization (OPPO). It ain’t perfect today – in fact, it’s pretty doggone limited, like PSPO was when it first shipped, but I have hopes that SQL Server vNext will make it actually usable. Let’s discuss what we’ve got today first.

Okay, I really had to stretch the truth to make my lead-in work, but I’m too proud of it to change anything. Click through to see where OPPO is today. Even with just one optional parameter working well, there is still a class of stored procedures that this can help: the “get by one ID, or get me all of them” type.

Leave a Comment

Handling Large Delete Operations in TimescaleDB

Semab Tariq deletes a significant amount of data:

In today’s blog, we will discuss another crucial aspect of time-series data management: massive delete operations.

As your data grows over time, older records often lose their relevance but continue to occupy valuable disk space, potentially increasing storage costs and might degrade the performance if not managed well. 

Let’s walk through some strategies to clean up or downsample aged data in TimescaleDB, helping you maintain a lean, efficient, and cost-effective database.

The “or downsample” is huge, by the way: as a simple example, suppose you collect one record every millisecond, or 1000 per second. Say that we have a date+time and a few floating point numbers that add up to 40 bytes per record. If we have a year of data at that grain, we have 40 bytes/record * 1000 records/second * 3600 seconds/hour * 24 hours/day * 365.25 days/year, or 1,262,304,000,000 bytes/year. That’s ~1.15 terabytes of data per year, assuming no compression (which there actually is, but whatever). By contrast, if you keep millisecond-level data for a week, second-level for 3 weeks, and minute-level for the remaining year, you have:

40 bytes/record * 1000 records/second * 3600 seconds/hour * 24 hours/day * 7 days/week * 1 week = 22.53 gigabytes
40 bytes/record * 1 record/second * 3600 seconds/hour * 24 hours/day * 7 days/week * 3 weeks = 69 megabytes
40 bytes/record * 1 record/minute * 60 minutes/hour * 24 hours/day * 337.25 days = 18.5 megabytes

And for most cases, we only need the lowest level of granularity for a relatively short amount of time. After that, we typically care more about how the current data looks versus older data, for the purposes of trending.

Leave a Comment

Online Vertical Scaling of SQL Server in Kubernetes

Andrew Pruski controls the horizontal. Andrew Pruski controls the vertical:

One of the new features in Kubernetes v1.33 is the ability to resize CPU and memory resources for containers online, aka without having to recreate the pod the container is running in. In the past, when adjusting a pod’s resources, Kubernetes would delete the existing pod and create a new one via a controller.

Not a problem for applications that can have multiple replicas running, but for SQL Server this would cause a disruption as we (generally) only have one pod running SQL Server in a statefulset. Let’s see this in action.

Click through to see an example of normal behavior, followed by how it differs in the latest version of Kubernetes.

Leave a Comment

Kafka Connector for Cosmos DB

Sudhindra Sheshadrivasan announces a new connector has become generally available:

We’re excited to announce the General Availability (GA) of the Confluent fully managed V2 connector for Apache Kafka® for Azure Cosmos DB! This release marks a major milestone in our mission to simplify real-time data streaming from and to Azure Cosmos DB using Apache Kafka®.

The V2 connector is now production-ready and available directly from the Confluent Cloud connector catalog. This managed connector allows you to seamlessly integrate Azure Cosmos DB with your Kafka-powered event streaming architecture—without worrying about provisioning, scaling, or managing the connector infrastructure.

Read on to learn more about the new connector and what it takes to hook everything up.

Leave a Comment

Materialized Lake Views in Microsoft Fabric

Balaji Sankaran has a new announcement:

We are excited to announce Materialized Lake views (MLV) in Microsoft Fabric. Coming soon in preview, MLV is a new feature that allows you to build declarative data pipelines using SQL, complete with built-in data quality rules and automatic monitoring of data transformations. In essence, an MLV is a persisted, continuously updated view of your data that simplifies how you implement multi-stage Lakehouse processing, commonly referred to as medallion architecture.

Read on to see how it works and some of its capabilities.

Leave a Comment

Source Control: A Call to Action

Steve Jones wants you (to learn about source control):

I have been surprised how many people aren’t comfortable with version control or Git. Many don’t have the habit, but are amenable to it. What I’m amazed by in 2025 is how many people don’t use it, given that so many tools we use to work with databases, and even other systems, will store items in Git. This isn’t just for development code, but also for infrastructure code. Lots of data tools and servers can store data in Git and use it to deploy changes to all kinds of systems. I’d have expected more people to know Git.

If you’re apprehensive about learning Git, check out tools like SourceTree and GitHub Desktop. You can even use the source control built into Visual Studio Code if that’s your tool of choice. These UIs make it significantly easier to work with Git.

Then, if you want to get rid of about 80% of the pain of Git, use feature branching. Here are takes on the topic from Olivier Van Steenlandt and Adron Hall, and I’m sure you can find plenty of other examples. Git can be a pain in the neck, especially when dealing with merge conflicts, but the benefit is well worth it.

Leave a Comment

SQL Server Standard Developer Edition

Ed Pollack is pleased:

SQL Server Developer Edition has for years provided a free data platform to test and develop applications in an environment that has all SQL Server features available at no cost to the organizations that use it. So long as these SQL Servers do not host production workloads, then this edition is ideal for database development.

Starting in SQL Server 2025, a new edition is available for installation: SQL Server Developer Standard Edition. This article dives into this edition, the problems it solves, and why it is a great addition for development teams that use SQL Server!

This is something people in the community have agitated over for years. It’s great that Developer edition has all of the features of Enterprise, but that can be a problem when you build your apps against the equivalent of Enterprise and deploy them to Standard, as there’s a risk you’re using an Enterprise-only feature but don’t know it.

1 Comment