Press "Enter" to skip to content

Author: Kevin Feasel

Capacity Options in Microsoft Fabric

Reitse Eskens is at capacity:

Now I’m going to do something scary and try to explain some things. I tried to pay attention during the precon and had the pleasure of talking with Ljubica Vujovic Boskovic on the capacity usage. She, very patiently, helped me out where my mind completely lost all track. Her explanations were great, any errors are all mine and I will correct this blogpost if there are mistakes. If you want to know more, you can also read this blog by Chris Novak who digs a bit deeper into smoothing and bursting.

So let me give you a very quick and simple introduction into the capacity challenges we’re going to face.

Read on for an overview of how Microsoft Fabric capacity planning works and one concern with this style of “one capacity to rule them all.”

Comments closed

An Overview of Event-Driven Architecture

Yaniv Ben Hemo explains what event-driven architecture is:

First things first, Event-driven architecture. EDA and serverless functions are two powerful software patterns and concepts that have become popular in recent years with the rise of cloud-native computing. While one is more of an architecture pattern and the other a deployment or implementation detail, when combined, they provide a scalable and efficient solution for modern applications.

Click through for a primer on event-driven architecture. This is a pattern that I find quite useful for optimizing cloud pricing, assuming your normal business processes can run asynchronously—that is, people are not expecting near-real-time performance and you can start and stop processes periodically in order to “re-use” the same compute for multiple services. The alternative use of EDA is that your services need to be running all the time, but you also have multiple teams working together on the solution and you want to decouple team efforts. In that case, you define queues or Kafka-style topics and let those act as the mechanism for service integration.

This is definitely an architecture that works better for cloud-based systems than on-premises systems.

Comments closed

Substrings in MySQL

Rahul Mehta describes how substrings work in MySQL:

MySQL is an open-source relational database management system. It is a widely used relational database management system in the entire world. MySQL like any other database can store different types of data. One of the most used data types is “String”. Developers widely use it in storing data as well as in different formatting operations. One of the key requirements we will always come across is to derive a part of the string. MySQL provides a “SUBSTRING” function to extract a substring from a string. My SQL has below options for extracting the string:

  1. SUBSTRING
  2. SUBSTR (A SYNONYM FOR SUBSTRING)
  3. SUBSTRING_INDEX

Read on to see how these two functions work. They differ a bit from SQL Server in terms of functionality, though there’s a lot of overlap between the two.

Comments closed

A Primer on SQL Server Security

Ben Johnston gives us a high-level overview of SQL Server security options:

SQL Server security structure, mechanisms and methods are very thoroughly documented in the Microsoft documentation, but it is quite daunting if you don’t already know about the functionality. I recently had a request to explain some security features of SQL Server so that internal audits could be completed. While thinking about the request and preparing for the meeting, I realized how many security features are available in SQL Server. The purpose of this post is not to thoroughly explain how all of these items work but to give an introduction to these features and a few recommendations. Given how many security-centered features are available, I’m sure I missed a few, and new features are added all the time, but these are the main features at the time of this writing.

This is solid as a view into what options are available. I do have at least one moderate-to-large qualm with the article: cross-database ownership chaining is something you should never enable; use module signing instead.

Comments closed

Handling Source System Deletions in a Warehouse

Rayis Imayev deletes some rows:

When something important disappears, it’s natural to start asking questions and looking for answers, especially when that missing piece has had a significant impact on your life.

Similarly, when data that used to exist in your sourcing system suddenly vanishes without any trace, you’re likely to react in a similar way. You might find yourself reaching out to higher authorities to understand why the existing data management system design allowed this to happen. Your colleagues might wonder if better ways to handle such data-related issues exist. Ultimately, you’ll embark on a quest to question yourself about what could have been done differently to avoid the complete loss of that crucial data.

Kimball-style data warehousing already has the idea of type-2 slowly changing dimensions, which allow you to track the deletion of dimensional data by assigning an end date to the row and not inserting a new record with the next start date. It’s a little harder to deal with fact data deletions in that way, though, as there historically is no concept of slowly changing facts.

Read on for some thoughts on the topic from Rayis.

Comments closed

Excel Data Analysis with Python

Chris Webb takes us through a new add-in for Excel:

In the Power BI/Fabric community everyone is excited about the recent release of Semantic Link: the ability to analyse Power BI data easily using Python in Fabric notebooks. Sandeep Pawar has an excellent blog post here explaining what this is and why it’s so cool. Meanwhile in the Excel community, everyone is excited about the new integration of Python into Excel. But can you analyse Power BI data in Excel using Python? Yes you can – so as my teenage daughter would say, it’s time for a crossover episode.

Click through for an example of it in action.

Comments closed

Killing a Running Apache Spark Application

The Big Data in Real World team pulls the plug on an application:

Apache Spark is a powerful open-source distributed computing system used for big data processing. However, sometimes you may need to kill a running Spark application for various reasons, such as if the application is stuck, consuming too many resources, or taking too long to complete. In this post, we will discuss how to kill a running Spark application.

Click through to see how you can do this.

Comments closed

Creating Horizontal Legends in R

Steven Sanderson flattens the legend:

Creating a horizontal legend in base R can be a useful skill when you want to label multiple categories in a plot without taking up too much vertical space. In this blog post, we’ll explore various methods to create horizontal legends in R and provide examples with clear explanations.

Read on for two demos, one with a single legend and one which creates two legends. I’m not so sure about how valuable the latter is (because you’re splitting valuable information into two places, losing some of the glanceability of a chart along the way), but it is interesting that you can do it.

Comments closed

Connection Pooling in Postgres

Semab Tariq shows off a tool for Postgres:

PgBouncer is a lightweight yet powerful connection pooling tool for PostgreSQL. It efficiently manages and reuses database connections, reducing the load on the server and improving performance. It acts as an intermediary between applications and the PostgreSQL database, optimizing connection usage and enhancing scalability.

This is a bit different from SQL Server, where connection pooling is built in. Read on to see how it works.

Comments closed