Press "Enter" to skip to content

Month: September 2023

Deployment Pipelines for Microsoft Fabric

Reitse Eskens crosses a line:

It’s a bit of a challenge to keep up with all the changes, updates and all the new stuff coming out for Fabric. As I’m not really invested in the PowerBI part of the data platform (yay pie charts ;)), some things that are very common for the PowerBI community are very new to me. I have it on good authority that this blog covers a feature that is well know within PowerBI but quite new in the data engineering part. When I say that, I need to add that at the time of writing, only the PowerBI side of things are fully supported but I have very good hopes that pipelines and notebooks will be supported as well.

Supporting pie charts are fightin’ words here. Nonetheless, read on to see how deployment pipelines work in Microsoft Fabric.

2 Comments

Finding SSAS Tabular Dimensions in Excel

Olivier Van Steenlandt has lost a few dimensions in the couch cushions:

A colleague reached out last week while connecting to one of our SQL Server Analysis Services models in Excel. He couldn’t find the expected Attribute folders in the model. He was looking for the following dimensions:

Of particular interest was that this colleague could not see them but Olivier could. The answer ends up being a bit surprising.

Leave a Comment

Query Execution Concepts and SQL Server

Erik Darling answers the question, why is it so hard to figure out why my query sometimes sucks:

Sometimes people will ask me penetrating questions like “why does SQL Server choose a bad execution plan?” or “why is this query sometimes slow?”

Like many things in databases, it’s an endless spiral of multiverses (and turtles) in which many choose your own adventure games are played and, well, sometimes you get eaten by a Grue.

In this post, I’m going to talk at a high level about potential reasons for both.

Read on for a smorgasbord of factors to consider based on the steps SQL Server takes.

Leave a Comment

Grouped Scatter Plots in R

Steven Sanderson builds a scatter plot:

Data visualization is a powerful tool for gaining insights from your data. Scatter plots, in particular, are excellent for visualizing relationships between two continuous variables. But what if you want to compare multiple groups within your data? In this blog post, we’ll explore how to create engaging scatter plots by group in R. We’ll walk through the process step by step, providing several examples and explaining the code blocks in simple terms. So, whether you’re a data scientist, analyst, or just curious about R, let’s dive in and discover how to make your data come to life!

Click through for several examples of plot generation.

Leave a Comment

ORMs and Mapping Requirements

Mark Seemann is not a big fan of Entity Framework:

When I evaluate whether or not to use an ORM in situations like these, the core application logic is my main design driver. As I describe in Code That Fits in Your Head, I usually develop (vertical) feature slices one at a time, utilising an outside-in TDD process, during which I also figure out how to save or retrieve data from persistent storage.

Thus, in systems like these, storage implementation is an artefact of the software architecture. If a relational database is involved, the schema must adhere to the needs of the code; not the other way around.

To be clear, then, this article doesn’t discuss typical CRUD-heavy applications that are mostly forms over relational data, with little or no application logic. If you’re working with such a code base, an ORM might be useful. I can’t really tell, since I last worked with such systems at a time when ORMs didn’t exist.

Read on for a thoughtful argument. The only critique I have is I’d prefer stored procedures over saving SQL queries in the code.

1 Comment

Incremental Sort in Postgres

Umair Shahid takes us through the concept of incremental sort in PostgreSQL:

Incremental sort is a database optimization feature, introduced in PostgreSQL 13, that allows sorting to be done incrementally during the query execution process. Sorting is a common operation in database queries, often necessary when retrieving data in a specific order. PostgreSQL’s query planner uses incremental sort to improve query performance, particularly for large datasets. This feature is enabled by default in PostgreSQL 13 and above. 

Read on to see how it works and some good practices which help maximize the likelihood that you can take advantage of the feature.

Leave a Comment

Building a Flink Application in Java

Wade Waldron talks about a (free) new course:

Recently, I got my hands dirty working with Apache Flink®. The experience was a little overwhelming. I have spent years working with streaming technologies but Flink was new to me and the resources online were rarely what I needed. Thankfully, I had access to some of the best Flink experts in the business to provide me with first-class advice, but not everyone has access to an expert when they need one. 

To share what I learned, I created the Building Flink Applications in Java course on Confluent Developer. It provides you with hands-on experience in building a Flink application from the ground up. I also wrote this blog post to walk through an example of how to do dataflow programming with Flink. I hope these two resources will make the experience less overwhelming for others.

Click through for the blog post and check out the full course if you’re so inclined.

Leave a Comment

Working with Histogram Breaks in R

Steven Sanderson divvies out buckets for a histogram:

Histograms divide data into bins, or intervals, and then count how many data points fall into each bin. The breaks parameter in R allows you to control how these bins are defined. By specifying breaks thoughtfully, you can highlight specific patterns and nuances in your data.

Click through to see how you can use the breaks parameter in a few different ways to customize your histogram. The default breaks in R are often reasonable, but trying a few different breaks can help you get a better understanding of the actual distribution of the data.

Leave a Comment