Press "Enter" to skip to content

Day: November 26, 2024

Power BI Writeback via Fabric SQL Database

Jon Voge gives us a use case for Fabric SQL Databases:

Until recently, Fabric has allowed us to choose between Lakehouses and Warehouses as a backend. For write-back use cases, neither are ideal.

  • The SQL Endpoint of Lakehouses are Read-Only, making writes from Power Apps impossible.
  • While the SQL Endpoint of Warehouses are write-enabled, they do not support enforced Primary Keys, which are a hard requirement for Power Apps to be able to write directly to a data source.

Jon briefly describes two mechanisms people used and then how you can do this more effectively with a Fabric SQL Database. Based on the article, it seems that you could probably still do the same with an Azure SQL Database, though I suppose handling the managed identity could be an issue.

Comments closed

Data Visualization in Matplotlib

Rajendra Gupta generates some graphics:

Data analysis requires analysts to handle structured, semi-structured, or unstructured data. Small datasets with few rows and columns are easy to understand. However, as the data complexity increases with many interlinked variables, getting data insights from tabular formatted data becomes challenging. According to a recent study from MIT, the human brain processes an entire image in just 13 milliseconds. Therefore, it is helpful to learn Python and visualization together.

How do we use Python to generate plots from the data to analyze patterns, correlations, and trends? What plots are available, and how do we use them with customizations? Let’s explore them in this tip.

There are a few visualization libraries in Python I prefer over matplotlib, and for static graphics, ggplot2 in R has pretty much everything else beat. But matplotlib is essentially the standard, so it’s important to know.

Comments closed

GROUP BY Clause Order and Performance in PostgreSQL

Andrei Lepikhov runs some tests:

PostgreSQL users often employ analytical queries that sort and group data by different rules. Optimising these operators can significantly reduce the time and cost of query execution. In this post, I will discuss one such optimisation: choosing the order of columns in the GROUP BY expression.

Postgres can already reshuffle the list of grouped expressions according to the ORDER BY condition to eliminate additional sorting and save computing resources. We went further and implemented an additional strategy of group-by-clause list permutation in a series of patches (the first attempt and the second one) for discussion with the Postgres community, expecting it to be included in the next version of PostgreSQL core. You can also try it in action in the commercial Postgres Pro Enterprise fork.

From a “this is a 4th generation language” perspective, there should never be a performance difference in the ordering of grouped columns in a GROUP BY clause. The database optimizer should always determine the optimal ordering. Read Andrei’s article to see how much of a challenge that can be in practice.

Comments closed

Building a Gantt Chart

Amy Esselman wants to know if you’re on time or have any slack:

A Gantt chart is used to visually present tasks or events over time. It consists of a horizontal axis of time and horizontal bars representing the duration of each item arranged along the vertical axis.

The visual was popularized in the early 1900s by American engineer and management consultant Henry Gantt, for whom the chart is named. Most commonly, this view is used to see the start date and duration of each task in a larger project to better understand timelines, milestones, and dependencies.

Click through for an overview of the chart, where it can be useful, and some of the limitations of the chart.

Comments closed

An Overview of the Naive Bayes Class of Algorithms

Harris Amjad takes us through a rather useful class of algorithms for classification:

As AI and Machine Learning have increased in popularity, especially Large Language Models, more professionals have explored how these systems work. Unfortunately, some put the cart before the horse, where they take on more complex algorithms before learning to pave the foundation, resulting in faded interest in the topic. This tip will introduce a simple probabilistic, yet powerful classifier, the Naïve Bayes Model, and implement it in Python.

I like using the Naive Bayes variants, despite the fact that it is not Bayesian and arguably isn’t very naive. The reason I like to use this class of algorithm is that it’s fast, easy, and gives you a useful baseline for quality. If you need to meet some specific quality threshold (say, accuracy > 85% or F1-score above 0.8), you can get an answer quickly with Naive Bayes. If that answer is anywhere near your threshold, the problem is likely solvable. If your answer is way below the threshold, it’s probably not worth spending the time or compute effort trying out a variety of other algorithms.

Comments closed

Analyzing Semantic Model Logs via Microsoft Fabric

Sandeep Pawar parses the logs:

Workspace Monitoring was one of my favorite announcements at MS Ignite ‘24 this week. It logs events from Fabric items such as Semantic Models, Eventhouse, GraphQL to a KQL database that’s automatically provisioned and managed in that workspace. Currently it’s limited to these three items but hopefully other (especially spark and pipelines) will be added soon. Read the announcement by Varun Jain (PM, Microsoft) on this for details. 

Click through for some thoughts from Sandeep, as well as a variety of useful queries.

Comments closed