Press "Enter" to skip to content

Category: Python

Automating Semantic Model Security via Semantic Link

Marc Lelijveld writes a script:

You may be using standardized solutions like Fabric Unified Admin Monitoring (FUAM) or any other templated solution that comes with a semantic model. As part of transparency within your organization, you decided to share the insights gathered with others in the organization by adjusting the solution to apply your own security setup on top.

However, after running an update of the template, you’ve overwritten your custom security configuration and reapplying costs a lot of time, again and again after each update. Why don’t we just script this security? In this blog I will share how you can deploy security configurations to semantic models and assign users to these roles.

Click through for an example script and details on how it works.

Leave a Comment

Creating a Python Package via Poetry

Osheen MacOscar builds a package:

In this blog series (this and the next blog) I am going to demonstrate how to use Poetry to create a Python package, set up testing infrastructure and install it. I am going to be creating a wrapper around the Fantasy Premier League API and creating a function which can create a weekly league table.

This is a straightforward example of how to create a new Python package and add a function call to it.

Leave a Comment

Error Handling in PySpark Jobs

Ram Ghadiyaram adds some error handling logic:

In PySpark, processing massive datasets across distributed clusters is powerful but comes with challenges. A single bad record, missing file, or network glitch can crash an entire job, wasting compute resources and leaving you with stack traces that have many lines. 

Spark’s lazy evaluation, where transformations don’t execute until an action is triggered, makes errors harder to catch early, and debugging them can feel like very, very difficult.

Read on for five patterns that can help with error handling in PySpark.

Leave a Comment

Choosing between Data Scalers in a Data Science Project

Bala Pirya C performs a comparison:

In this article, you will learn how MinMaxScaler, StandardScaler, and RobustScaler transform skewed, outlier-heavy data, and how to pick the right one for your modeling pipeline.

Topics we will cover include:

  • How each scaler works and where it breaks on skewed or outlier-rich data
  • A realistic synthetic dataset to stress-test the scalers
  • A practical, code-ready heuristic for choosing a scaler

Read on to learn more about each of these three scaler types, the use cases that best fit each of them, and even a flow chart at the end.

Leave a Comment

Cross-Validation and Time Series Data

Vlad Johnson takes us through a technique to test time series results:

Time series modeling, compared to traditional nontemporal modeling, presents unique challenges in ensuring that models generalize well to future, unseen data. One key methodology to address these challenges is cross-validation.

Time series data inherently contains temporal dependencies — observations are ordered in time, and future values may depend on past trends. This structure makes it challenging to estimate how well a model will perform on new, unseen data.

Click through for an explanation of cross-validation, why this becomes challenging when you have time series data (or other serially correlated data), and tips to resolve this challenge.

Comments closed

Ingesting IoT Data into SQL Server via Python

Hristo Hristov builds an app:

MQTT is a lightweight Industrial IoT communications protocol allowing efficient communication to and from edge devices such as machines, sensors, and actuators. How can we get data from an MQTT on-premises or cloud broker and persist them in an SQL Server database? How can we leverage the newest features in SQL Server 2025 to make efficient query compilations and build a scalable solution for a data pipeline for permanently storing IoT data?

Read on for the code, most of which is in Python.

Comments closed

Comparing the ROC Curve to a Precision-Recall Curve

Ivan Palomares Carrascosa looks at two ways to plot classification model trade-offs:

When building machine learning models to classify imbalanced data — i.e. datasets where the presence of one class (like spam email for example) is much less frequent than the presence of the other class (non-spam email, for instance) — certain traditional metrics like accuracy or even the ROC AUC (Receiving Operating Characteristic curve and the area under it) may not reflect the model performance in realistic terms, giving overly optimistic estimates due to the dominance of the so-called negative class.

Precision-recall curves (or PR curves for short), on the other hand, are designed to focus specifically on the positive, typically rarer class, which is a much more informative measure for skewed datasets due to class imbalance.

Read on to see how these two curves can diverge and when you might trust one over the other. Ivan’s post does rely on the idea of the positive class being the smaller one and the dataset being markedly unbalanced

Comments closed

Challenges of High-Dimensional Optimization

John Mount lays out a demonstration:

My experience is that common objective functions tend to be structured and full of coincidences and symmetries. And because they have these structures they are hard to optimize.

Let’s work up what I claim to be a fairly typical optimization problem that arises from planning or scheduling. I’ll call it the train arrival schedule problem.

Click through for the article, which includes demonstration code.

Comments closed

Modifying Power BI Page Visibility and Active Status via Semantic Link Labs

Meagan Longoria hides (or shows) a page:

Setting page visibility and the active page are often overlooked last steps when publishing a Power BI report. It’s easy to forget the active page since it’s just set to whatever page was open when you last saved the report. But we don’t have to settle for manually checking these things before we deploy to a new workspace (e.g., from dev to prod). If our report is in PBIR format, we can run Fabric notebooks to do this for us.

Click through for a notebook and an explanation.

Comments closed