Press "Enter" to skip to content

Author: Kevin Feasel

Feature Engineering with Azure ML and Microsoft Fabric

Siliang Jiao, et al, talk architecture:

Feature engineering is the process of using domain knowledge to extract features (characteristics, properties, attributes) from raw data. The extracted features are used for training the models that can predict values for relevant business scenarios. A feature engineering system provides the tools, processes, and techniques used to perform feature engineering consistently and efficiently. 

This article elaborates on how to build a feature engineering system based on Azure Machine Learning managed feature store and Microsoft Fabric. 

Click through to see how the pieces fit together.

Leave a Comment

Plotting Training and Testing Results with tidyAML

Steven Sanderson builds a plot:

In the realm of machine learning, visualizing model predictions is essential for understanding the performance and behavior of our algorithms. When it comes to regression tasks, plotting predictions alongside actual values provides valuable insights into how well our model is capturing the underlying patterns in the data. With the plot_regression_predictions() function in tidyAML, this process becomes seamless and informative.

Read on to see how the function works and the kind of result you can expect from it.

Leave a Comment

Copilot in Microsoft Fabric Dataflows Gen2

Reza Rad shows off a capability:

There has been a lot of hype recently about Generative AI and Copilot in Microsoft. Microsoft Fabric incorporates many of those features, and one of the areas it has been added to is the Dataflow Gen2 in Microsoft Fabric, or we can also call it Power Query in Power BI Service Dataflows. In this article and video, I will describe how the Copilot works with Data Factory Dataflow Gen2, its requirements, and its examples.

Click through for the video and the article. The thing that I believe will keep many people from using this is that you need a Microsoft Fabric capacity of F64 or greater to get access to Copilot. That’s a pretty hefty requirement.

Leave a Comment

Copying a Direct Lake Semantic Model between Fabric Workspaces

Kevin Chant makes a copy:

In this post I introduce scripts to improve copying a Direct Lake semantic model to another workspace using Microsoft Fabric Git integration.

I wanted to do this follow-up after my previous post about my initial tests to copy a Direct Lake semantic model to another workspace using Microsoft Fabric Git integration.

Due to the fact that I want to show how you can work with scripts locally to create the repository that contains the Direct Lake semantic model. Plus, how to do this in a way that includes the new Tabular Model Definition Language (TMDL) semantic file format.

Read on to see how it all fits together.

Leave a Comment

Postgres Internals: Database Clusters, Databases, and Tables

Semab Tariq begins a new series:

A database cluster is a collection of multiple databases managed by a single PostgreSQL server. It can be referred to as a data/base directory.

A database is a collection of database objects. Whereas a database object is a data structure used to store objects such as tables, views, indexes, extensions, Sequences functions, etc. In simple words, anything that we can create or store within a database is a database object

Read on to learn more about how Postgres lays out database files and tablespaces.

Leave a Comment

Using IN and NOT IN in SQL Server

Erik Darling shares some advice:

I’ll be brief here, and let you know exactly when I’ll use IN and NOT IN rather than anything else:

  • When I have a list of literal values

That’s it. That’s all. If I have to go looking in another table for anything, I use either EXISTS or NOT EXISTS. The syntax just feels better to me, and I don’t have to worry about getting stupid errors about subqueries returning more than one value.

I’m typically a lot more flexible about using IN, though I do agree with NOT IN: that clause is usually more trouble than it’s worth.

Leave a Comment

Retrieving Spark Session Config Variables from Microsoft Fabric

Koen Verbeeck gets some settings:

I was trying some stuff out in a notebook on top of a Microsoft Fabric Lakehouse. I was wondering what some of the default values are of the configuration variables, and if there’s an easy way to retrieve them all. Luckily there is. In the code, I’m using Scala because it has a nice GetAll() function.

Click through for an example of how to use this. And bonus points for using Scala instead of Python here.

Leave a Comment

Overloading Power BI in Microsoft Fabric

Reitse Eskens pushes the envelope:

In my previous blog on Fabric and loadtesting, I ended with not really knowing how PowerBI would respond to all these rows. After creating and presenting a session on this subject, it’s time to dig into this part of Fabric as well. There were questions and I made promises. So here goes! This blog will only show the F2 experience as that’s where things went off the road. And, as I’ve shown in the previous blog, the CU count doesn’t change between SKU’s, only the amount of SKU’s available changes.
This blog isn’t meant to scold Fabric or make it look silly, I’m the one who’s silly. The goal is to show some limitations, a way you can do some load testing and help you find your way in the available metrics.

Read on to see what Reitse has gotten into.

Leave a Comment

Postgres Data Extraction with LATERAL joins and More

Ryan Booz extracts some data:

In our data hungry world, knowing how to effectively load and transform data from various sources is a highly valued skill. Over the last couple of years, I’ve learned how useful many of the data manipulation functions in PostgreSQL can supercharge your data transformation and analysis process, using just PostgreSQL and SQL.

For the last couple of decades, “Extract Transform Load” (ETL) has been the primary method for manipulating and analyzing the results. In most cases, ETL relies on an external toolset to help acquire different forms of data, slicing and dicing it into a form suitable for relational databases, and then inserting the results into your database of choice. Once it’s in the destination table with a relational schema, querying and analyzing it is much easier.

I call out CROSS JOIN LATERAL (or any kind of lateral join) here because it’s the ANSI equivalent of T-SQL’s APPLY operator, and I’ve already pointed out once today that I’m a huge fan of APPLY.

Leave a Comment