Author: Kevin Feasel

Today, I wanted to give the new Fabric Data Agents a try. According to the documentation, a Fabric Data Agent is defined as follows:

Data agent in Microsoft Fabric is a new Microsoft Fabric feature that allows you to build your own conversational Q&A systems using generative AI. A Fabric data agent makes data insights more accessible and actionable for everyone in your organization. With a Fabric data agent, your team can have conversations, with plain English-language questions, about the data that your organization stored in Fabric OneLake and then receive relevant answers. This way, even people without technical expertise in AI or a deep understanding of the data structure can receive precise and context-rich answers.

Let’s give it a try and build our first Data Agent.

Click through for the pre-requisites, the setup process, and how everything looked for Wolfgang.

Comments closed

Preventing Injection Attacks in Shiny

Published 2025-05-23 by Kevin Feasel

Arthur Breant shares some advice:

Code injection is a common security vulnerability that involves injecting malicious code into a page or application. This code is then executed, creating the security breach. There are several ways to inject code into an application, and Shiny is unfortunately not immune to these risks.

Click through for a quick overview of the three most common types of injection attack. There’s nothing special about Shiny here—any system that executes code based on user input is potentially vulnerable to injection attacks—so it is good to keep these tips in mind. H/T R-Bloggers.

Comments closed

Building an ML-Friendly Data Lake with Apache Iceberg

Published 2025-05-23 by Kevin Feasel

Anant Kumar designs a data lake:

As companies collect massive amounts of data to fuel their artificial intelligence and machine learning initiatives, finding the right data architecture for storing, managing, and accessing such data is crucial. Traditional data storage practices are likely to fall short to meet the scale, variety, and velocity required by modern AI/ML workflows. Apache Iceberg steps in as a strong open-source table format to build solid and efficient data lakes for AI and ML.

Click through for a primer on Iceberg, how to set up a fairly simple data lake, and some functionality that can help in model training.

Comments closed

Loading JSON into a Microsoft Fabric Eventhouse

Published 2025-05-23 by Kevin Feasel

Christopher Schmidt loads some data:

In the era of big data, efficiently parsing and analyzing JSON data is critical for gaining actionable insights. Leveraging Kusto, a powerful query engine developed by Microsoft, enhances the efficiency of handling JSON data, making it simpler and faster to derive meaningful patterns and trends. Perhaps more importantly, Kusto’s ability to easily parse simple or nested JSON makes it easier then ever to extract meaningful insights from this data. The purpose of this blog post is to walk through ways that JSON data can be loaded into Eventhouse in Microsoft Fabric, where you can then leverage Kusto’s powerful capabilities for this. I’ve tried this a few different ways, and the below approach is the fastest, most efficient low-code way to ingest the data into the Eventhouse. As JSON inherently supports different schemas in a single file, the expectation here is that we have a json file with varying schemas within a single file, and we would like to load this into our Eventhouse for efficient parsing with KQL.

Read on for the process.

Comments closed

Interpreting V$ and GV$ Views in Oracle RAC

Published 2025-05-23 by Kevin Feasel

Kellyn Gorman continues a series on Oracle Real Application Clusters:

Furthering on our Oracle Real Application Clusters (RAC) knowledge, we’re going to go deeper into what we watch for a RAC database that may be different than a single instance. RAC is built for scale and instance resilience, distributing workloads across multiple nodes. At the same time, what gives it strength introduces monitoring complexity, especially when you’re not just watching a single instance but multiple, interconnected ones. To manage performance effectively in RAC, you need to understand the difference between V$ and GV$ views, what they show you, and how to interpret cluster-level wait events. Along with performance, the overall health of the RAC cluster and interconnect must be known, too.

Click through for Kellyn’s explanation.

Comments closed

Optional Parameter Plan Optimization in SQL Server 2025

Published 2025-05-23 by Kevin Feasel

Brent Ozar is down with OPP(O):

SQL Server 2025 improved PSPO to handle multiple predicates that might have parameter sensitivity, and that’s great! I love it when Microsoft ships a v1 feature, and then gradually iterates over to make it better. Adaptive Memory Grants were a similar investment that got improved over time, and today they’re fantastic.

SQL Server 2025 introduces another feature to mitigate parameter sniffing problems: Optional Parameter Plan Optimization (OPPO). It ain’t perfect today – in fact, it’s pretty doggone limited, like PSPO was when it first shipped, but I have hopes that SQL Server vNext will make it actually usable. Let’s discuss what we’ve got today first.

Okay, I really had to stretch the truth to make my lead-in work, but I’m too proud of it to change anything. Click through to see where OPPO is today. Even with just one optional parameter working well, there is still a class of stored procedures that this can help: the “get by one ID, or get me all of them” type.

Comments closed

Paste a List of Values into a Power BI Slicer

Published 2025-05-23 by Kevin Feasel

Dan English doesn’t want to click over and over:

Have you ever wanted to take a list of values from say an Excel spreadsheet and paste those into a Power BI slicer to filter the list? Like say you are only interested in particular set of items, but the list of items is long and filtering through a list of say a thousands values can take a while. I bet you have and this has been an item that has been requested for a very long time going back to 2017!

Well believe it or not, I just found out this week during a meeting with the product team it has been released!!

Click through for the limitations, as well as a demo of how it works.

Comments closed

Online Vertical Scaling of SQL Server in Kubernetes

Published 2025-05-23 by Kevin Feasel

Andrew Pruski controls the horizontal. Andrew Pruski controls the vertical:

One of the new features in Kubernetes v1.33 is the ability to resize CPU and memory resources for containers online, aka without having to recreate the pod the container is running in. In the past, when adjusting a pod’s resources, Kubernetes would delete the existing pod and create a new one via a controller.

Not a problem for applications that can have multiple replicas running, but for SQL Server this would cause a disruption as we (generally) only have one pod running SQL Server in a statefulset. Let’s see this in action.

Click through to see an example of normal behavior, followed by how it differs in the latest version of Kubernetes.

Comments closed

Handling Large Delete Operations in TimescaleDB

Published 2025-05-23 by Kevin Feasel

Semab Tariq deletes a significant amount of data:

In today’s blog, we will discuss another crucial aspect of time-series data management: massive delete operations.

As your data grows over time, older records often lose their relevance but continue to occupy valuable disk space, potentially increasing storage costs and might degrade the performance if not managed well.

Let’s walk through some strategies to clean up or downsample aged data in TimescaleDB, helping you maintain a lean, efficient, and cost-effective database.

The “or downsample” is huge, by the way: as a simple example, suppose you collect one record every millisecond, or 1000 per second. Say that we have a date+time and a few floating point numbers that add up to 40 bytes per record. If we have a year of data at that grain, we have 40 bytes/record * 1000 records/second * 3600 seconds/hour * 24 hours/day * 365.25 days/year, or 1,262,304,000,000 bytes/year. That’s ~1.15 terabytes of data per year, assuming no compression (which there actually is, but whatever). By contrast, if you keep millisecond-level data for a week, second-level for 3 weeks, and minute-level for the remaining year, you have:

40 bytes/record * 1000 records/second * 3600 seconds/hour * 24 hours/day * 7 days/week * 1 week = 22.53 gigabytes
40 bytes/record * 1 record/second * 3600 seconds/hour * 24 hours/day * 7 days/week * 3 weeks = 69 megabytes
40 bytes/record * 1 record/minute * 60 minutes/hour * 24 hours/day * 337.25 days = 18.5 megabytes

And for most cases, we only need the lowest level of granularity for a relatively short amount of time. After that, we typically care more about how the current data looks versus older data, for the purposes of trending.

Comments closed

Kafka Connector for Cosmos DB

Published 2025-05-22 by Kevin Feasel

Sudhindra Sheshadrivasan announces a new connector has become generally available:

We’re excited to announce the General Availability (GA) of the Confluent fully managed V2 connector for Apache Kafka® for Azure Cosmos DB! This release marks a major milestone in our mission to simplify real-time data streaming from and to Azure Cosmos DB using Apache Kafka®.

The V2 connector is now production-ready and available directly from the Confluent Cloud connector catalog. This managed connector allows you to seamlessly integrate Azure Cosmos DB with your Kafka-powered event streaming architecture—without worrying about provisioning, scaling, or managing the connector infrastructure.

Read on to learn more about the new connector and what it takes to hook everything up.

Comments closed