Press "Enter" to skip to content

Author: Kevin Feasel

Building a Test Data Generator for PostgreSQL

Mika Sutinen builds some data:

I recently had a project where I needed quickly to generate some realistic looking test data to PostgreSQL database. While I often like to go for ready-made solutions, this felt like a good opportunity to stretch my coding muscles and develop it myself. Moreover, this seemed like a fun puzzle to solve, and I could probably use the same solution later on elsewhere.

Click through for a description of the generator, as well as a link to Mika’s GitHub repo. Taking a quick peek at it, it does appear that you could probably use this for other data platforms like SQL Server with very limited modification.

Comments closed

Query Start Times in Query Store

Hugo Kornelis describes an issue:

I was hired by a customer who had a very annoying issue with the daily data load of their data warehouse. The volume of data to be loaded is high and they were already struggling to finish the load before business opens. But that was not their biggest issue. The biggest problem, the real pain point that they hired me for, is that at unpredictable moments, the load would run much longer than normal, pushing it well into business hours. They wanted me to find out what caused those irregular delays, and find a way to stop them from happening.

Read on to learn more about the issue itself, as well as a discrepancy in what Query Store showed. Hugo also points out that the quick-and-easy solution may not be the right solution.

Comments closed

Context Transition in DAX

Marco Russo and Alberto Ferrari draw on a cocktail napkin:

In previous articles, we introduced a visual approach to describing two important DAX concepts: the filter context and the row context. This article completes this short series by describing the context transition using a graphical visualization.

This article provides a different perspective on the context transition already covered in other articles: you should read them to get more insights on this important concept for DAX.

Read on to see how it all fits together.

Comments closed

Myths and Reality of Copilot for Power BI

Kurt Buhler puts together an essay:

However, recent months reveal rising skepticism, concern and possibly even disillusionment with generative AI tools, both from investors (especially from investors) and from the public. Despite the massive investment, enthusiasm, and promotion, these tools seem to be seeing limited adoption and aren’t yet showing the measurable value that fulfills their promises. And yet, paradoxically, many professionals will agree anecdotally that they use generative AI tools regularly, and that these tools seem to help them be more productive in certain tasks. Furthermore, there are concrete success stories where generative AI is bringing value, such as the models like the latest versions of Alphafold (from Google) and ESMfold (from Meta) that aid in protein folding for pharmaceutical companies more effectively find potential new drug candidates. So, who are these tools for, what problems do they solve, and how can we use them effectively? This is too big of a topic for even Bink and Bonk the Data Goblins to solve, so let’s narrow the focus, a bit.

This is a must-read, and Kurt even provides a de-goblinified PDF version for management.

Comments closed

Reading Always Encrypted Data in Power BI

Rod Edwards wants to make use of encrypted data:

This is where things start to get a little more interesting compared to Pt1, as now we have a different application in the mix for reading the data. So how can that application retrieve the key needed to successfully decrypt?

Read on to see how it all works. There are a lot of working parts here, though some of it pertains to using an on-premises gateway versus Always Encrypted as such, so you get even more bang for your buck.

Comments closed

Dealing with Collinearity using Lasso Regression

Vinod Chugani always moves in the same direction:

One of the significant challenges statisticians and data scientists face is multicollinearity, particularly its most severe form, perfect multicollinearity. This issue often lurks undetected in large datasets with many features, potentially disguising itself and skewing the results of statistical models.

In this post, we explore the methods for detecting, addressing, and refining models affected by perfect multicollinearity. Through practical analysis and examples, we aim to equip you with the tools necessary to enhance your models’ robustness and interpretability, ensuring that they deliver reliable insights and accurate predictions.

Read on to learn a bit more about how collinearity works and how you can use lasso regression (instead of ridge regression) to deal with the problem.

Comments closed

Discerning a Star Schema from an Existing Report

Kelly Broekstra describes a common flow for business intelligence projects:

I have worked as a business intelligence developer for several years, and I’m always asked: “How do you convert user requirements to a functioning data model?”

I follow the Kimball methodology. For more information, check out the official pages.

But, here are some specific tips on what works for me.

Click through for those tips.

Comments closed

Point-in-Time Recovery with Postgres

Grant Fritchey restores some backups:

PostgreSQL has the capabilities to support backups as I already described in my first article on the topic. PostgreSQL also has the capability to restore to a point in time. However, that does require you to change the way you’re performing your backups. This article advances our understanding of how to better protect your PostgreSQL databases by expanding on the database backups and restores into a more full-blown disaster recovery process through point in time restores.

While the important part is the restore, in a classic chicken or egg conundrum, we can’t talk about restoring until we first have a backup, so I’ll start with how you need to backup your databases in preparation for a point in time restore.

Click through for the process and to see it in action.

Comments closed