Press "Enter" to skip to content

Month: July 2025

Testing Plumber APIs from R

Jakub Sobolewski builds some tests:

When building Plumber APIs in R, effective testing is crucial for ensuring reliability and maintainability.

This guide explores a proven pattern for testing own Plumber APIs that maintains fast feedback loops while providing robust coverage of both business logic and API contracts.

Read on for some good advice on API testing in general, followed by its application in R. Jakub also has a link to a GitHub repo showing a concrete example of how this testing can work. H/T R-Bloggers.

Comments closed

SSIS Slowdowns in Paging to Disk

Andy Brownsword notes a major performance risk in Integration Services:

One particular performance issue with SSIS data flows can fly under the radar – spilling to disk. This isn’t clearly visible through regular debugging or execution so can go unnoticed. And it hurts.

Paging to disk is bad for performance. Disks are much slower to access than memory, so we want to keep our data away when possible.

Andy calls out two reasons why we might find spilling to disk, as well as how to track if this is happening.

Comments closed

Building out a PBIReport.json File for Power BI Performance Load Testing

Gilbert Quevauvilliers continues a series on Power BI performance load testing:

I am going to be using Visual Studio Code to edit the PBIReport.JSON.

It is free to download and use, it works on any device.

And most importantly it will also show you errors in the JSON file. This can help when there are potential issues.

For the test I will be putting in the following details below into the PBIReport.JSON

Click through for the process, which is fairly complex all things considered.

Comments closed

Loading CSV Files via dbatools

David Seis loads some data:

Most businesses are rotten with Excel sheets and CSV exports from various tools and as my mentor puts it “Excel is the world’s database”. The dbatools command Import-DbaCsv enables the quick load of CSV tables into SQL Server, which then opens up the world of fast transformation and use in other tools such as PowerBI, or even just having it queryable rather than just in a file somewhere.

Import-DbaCsv is for you if any part of your job is to manually manage CSV files and you want to reduce the amount of time that you take to do that processing. Even if you are not a DBA but are interested in the data field, you can use SQL Server express for free, import and transform data using PowerShell for free, and export that data into Power BI or any other tool for free. I promise it can make your manual processes better, faster, and more resilient and save you tons of time!

Read on for a review of the quite useful cmdlet.

Comments closed

Managing TempDB Growth with Resource Governor

Haripriya Naidu digs into what’s new in SQL Server’s Resource Governor:

In this article, we’ll focus on how to control TempDB datafile growth using Resource Governor (RG).

  • RG is disabled by default and available only on Enterprise edition.
  • Until SQL Server 2022, RG could only manage user databases.
  • Starting with SQL Server 2025, RG can now manage TempDB as well.

Click through for the demo and additional information.

Comments closed

Generating Synthetic Data in Python

Ivan Palomares Carrascosa makes some data:

This article introduces the Faker library for generating synthetic datasets. Through a gentle hands-on tutorial, we will explore how to generate single records or data instances, full datasets in one go, and export them into different formats. The code walkthrough adopts a twofold perspective:

  1. Learning: We will gain a basic understanding of several data types that can be generated and how to get them ready for further processing, aided by popular data-intensive libraries like Pandas
  2. Testing: With some generated data at hand, we will provide some hints on how to test data issues in the context of a simplified ETL (Extract, Transform, Load) pipeline that ingests synthetically generated transactional data.

Click through for the article. I’m not intimately familiar with Faker, so I’m not sure how easy it is to change dataset distributions. That’s one of the challenges I tend to have with automated data generators: generating a simulated dataset is fine if you just need X number of rows, but if the distribution of synthetic data in development is nowhere near what the real data’s distribution is in production, you may get a false sense of security in things like report response times.

Comments closed

The PRODUCT() Function in SQL Server 2025

Ed Pollack points out a new function:

With each version of SQL Server, there are always a few new features introduced that we applaud as we finally have access to a useful function that is already available elsewhere.

Introduced in SQL Server 2025 CTP 1.3, the PRODUCT() function acts similarly to SUM(), but multiplies values rather than adds them. It is an aggregate function in SQL Server and therefore operates on a data set, rather than on scalar values.

Ed notes that there are aggregate and window function versions of PRODUCT() and shows examples of how it works.

Comments closed

Ordering Collections in Powershell

Shane O’Neill demands order:

However, there is one drawback I have with hash tables: they don’t default to the order in which they are inserted. I’m OK with this since I come from a DB background, and I’m used to order not being enforced unless I specify an ORDER BY. Not everyone is as lenient as we are, though, and the vast majority of the louder masses expect this ordering.

Now, there are ways around this unordered aspect of hash tables.

Click through for the easy answer, the less easy answer, and some additional thoughts on script development from Shane.

Comments closed

Goodbye, Default Semantic Models

Pradeep Srikakolapu makes an announcement:

Microsoft Fabric is officially sunsetting Default Semantic Models. This change is part of our ongoing efforts to simplify and improve the manageability, deployment, and governance of Fabric items such as warehouse, lakehouse, SQL database, and mirrored databases.

This is definitely a good thing. The idea of a default semantic model wasn’t bad, especially early on in Microsoft Fabric’s development life. But those default models almost never had enough information to do what customers actually want, so they would sit there as a distraction.

Comments closed