Press "Enter" to skip to content

Month: July 2024

Postgres Tuning Settings

Semab Tariq shares a few tips:

PostgreSQL is a widely used database known for its robust performance and reliability. To get the most out of PostgreSQL, tuning its parameters is crucial.

In this blog, we will explore the various PostgreSQL performance-related parameters and how to tune them effectively. By measuring Transactions Per Second (TPS) before and after tuning, and analyzing the results, we will demonstrate the significant impact of tuning on PostgreSQL performance.

Click through for some of the sorts of settings you might want to review. In Semab’s case, a simple server achieved nearly 30% better throughput after making these changes, so that’s not bad for the level of effort.

Comments closed

Controlling Azure Function Spend via Consumption Plan

Andy Brownsword saves some cash:

A consumption based App Service Plan in Azure provides us with a pay-as-you-go model for Function usage. This can help reduce spend from Premium plans where those plans exceed the requirements of the function, for example low volume or intermittent work.

Unfortunately you can’t move a Premium plan to Consumption based via the portal. Instead we’ll demonstrate how to use PowerShell to achieve this.

Read on for the code, as well as a bit more information on the Consumption tier..

Comments closed

Creating a Dragon Curve in R

Tomaz Kastrun adds dragons to the edge of the map:

The algorithm is a fractal curve of Hausdorff dimension 2. One starts with one segment. In each iteration the number of segments is doubled by taking each segment as the diagonal of a square and replacing it by half the square (90 degrees). Alternating and doing the left and right function / direction to complement in order to get the shape.

Clickt hrough for the sample code and how the plot looks.

Comments closed

Microsoft Fabric Warehouse Access Control

Koen Verbeeck talks permissions:

We are starting a new analytics project in Microsoft Fabric, and our data will land in a warehouse. This is the first time we’re using Fabric, and we are wondering about the different options for sharing access to a warehouse we developed in a workspace.

Click through for more information on providing and limiting access to data in a Microsoft Fabric warehouse.

Comments closed

Microsoft Fabric Lakehouse Ingesting CSV vs SQL

Reitse Eskens performs a comparison:

This blog will be a quite short one compared to the other blogs as it’s more of an overview to show you the capacity of Fabric ingesting CSV files in their native format into a Lakehouse and ingesting SQL data into a table structure inside the Lakehouse. Simple, straightforward stuff without any form of modification. You could call it bronze, raw, ingestion, temp or whatever your preferred naming convention is.

Why is this important? Well, we still have source systems that can only output to files. Just as we still have customers running on SQL Server 2000, legacy or even antique systems are still running. And it’s important to know how much capacity you use when just ingesting data without any modification.

Read on for the two scenarios, giving you an idea of which one is faster. I’d be interested in a third option, which is reading from Parquet files. My initial expectation would be that it would be even faster and more efficient, depending on the structure of the data.

Comments closed

Atomic Design for Report Development

Kurt Buhler has an interesting approach:

Developing a good semantic model or report takes a lot of time and effort. One way to reduce this cost is by re-using parts of an existing solution for a new model or project. This modular approach is particularly valuable when a developer faces common or recurring challenges and processes. Despite this, many developers commonly repeat efforts when they start new projects, models, and reports. For example, developers will often manually recreate measures, date tables, and patterns in a new model, or spend precious hours formatting visuals in a new report, while they have already created the same or similar things in the past. One reason for this is that it is difficult to identify candidate elements to re-use, or how you can re-use them in a convenient and scalable manner.

In this article, we want to introduce a conceptual framework from UI/UX called the atomic design methodology from Brad Frost. This framework can help developers to approach Power BI models and reports in a modular way to improve productivity and consistency of a developer’s work. The purpose of this article is to introduce the concept as well as some approaches that exist to re-use parts of your model and report. In future articles and videos, we will elaborate on these and other methods in additional detail.

I like the idea a lot, but Kurt does describe some of the challenges you’ll likely need to work through to adopt it.

Comments closed

Deploying a Power BI Project File via Azure DevOps

Angela Henry deploys to prod:

When it was announced there was a collective cheer from Power BI source control advocates heard ’round the world. Since it’s preview release, Microsoft has also added GIT integration with Fabric workspaces. This makes it so easy to incorporate source control for all (or almost all) of your Fabric artifacts, including Power BI.

But what happens when your organization already has a mature CI/CD process in place using Azure DevOps? Do you really want to break from that pattern and have it controlled somewhere else? That’s what this post is about, using Azure DevOps CI/CD pipelines to deploy your Power BI Project files (.pbip).

I’m going to share my experience in hopes that it will save you some time if this is the route you need to take.

Read on for Angela’s experience. Note that this applies both to Microsoft Fabric as well as a Fabric-less Power BI.

Comments closed

Hot and Cold Partitions for Apache Kafka Data

Gautan Goswami splits the data:

At first, data tiering was a tactic used by storage systems to reduce data storage costs. This involved grouping data that was not accessed as often into more affordable, if less effective, storage array choices. Data that has been idle for a year or more, for example, may be moved from an expensive Flash tier to a more affordable SATA disk tier. Even though they are quite costly, SSDs and flash can be categorized as high-performance storage classes. Smaller datasets that are actively used and require the maximum performance are usually stored in Flash.

Cloud data tiering has gained popularity as customers seek alternative options for tiering or archiving data to a public cloud. Public clouds presently offer a mix of object and file storage options. Object storage classes such as Amazon S3 and Azure Blob (Azure Storage) deliver significant cost efficiency and all the benefits of object storage without the complexities of setup and management. 

Read on for an architecture that uses hot and cold tiers, as well as how you can set it up on an existing Kafka topic.

Comments closed

SHAP and Additive Models

Michael Mayer answers a pair of related questions:

Within only a few years, SHAP (Shapley additive explanations) has emerged as the number 1 way to investigate black-box models. The basic idea is to decompose model predictions into additive contributions of the features in a fair way. Studying decompositions of many predictions allows to derive global properties of the model.

What happens if we apply SHAP algorithms to additive models? Why would this ever make sense?

Read on for the answers to these two questions.

Comments closed