Press "Enter" to skip to content

Category: Python

Running SQL against Fabric Warehouses via Python

Jared Westover builds a loop:

In a previous article, I ran a SQL script against a Fabric Warehouse 100 times without needing to click ‘Execute’ each time. A WHILE loop could work, but Query Insights treats it as a single execution. While using GO was an option, I wanted a different approach because I’m always trying to expand my skill set. I need a scalable way to run scripts for performance testing.

This is a pretty simple database connection and script execution. For the most part, it would work just fine for any other SQL Server family member, just with a somewhat different connection string depending on the product.

Leave a Comment

Reviewing Power BI Report Interactions via Semantic Link Labs

Meagan Longoria wants to know about visual interactions:

It can be tedious to check what visual interactions have been configured in a Power BI report. If you have a lot of bookmarks, this becomes even more important. If you do this manually, you have to turn on Edit Interactions and select each visual to see what interactions it is emitting to the other visuals on the page.

But there is a better way!

Click through for that better way.

Leave a Comment

Packaging and Publishing Python Packages via Poetry

Osheen MacOscar forces me into alliteration:

So far, in the previous blog we covered creating our package with Poetry, managing our development environment and adding a function. In the current blog post we’ll be covering the next steps with package development including documentation, testing and how to publish to PyPI.

Read on for several tips on making Python code package-ready and then how to distribute it via PyPi.

Leave a Comment

Automating Semantic Model Security via Semantic Link

Marc Lelijveld writes a script:

You may be using standardized solutions like Fabric Unified Admin Monitoring (FUAM) or any other templated solution that comes with a semantic model. As part of transparency within your organization, you decided to share the insights gathered with others in the organization by adjusting the solution to apply your own security setup on top.

However, after running an update of the template, you’ve overwritten your custom security configuration and reapplying costs a lot of time, again and again after each update. Why don’t we just script this security? In this blog I will share how you can deploy security configurations to semantic models and assign users to these roles.

Click through for an example script and details on how it works.

Comments closed

Creating a Python Package via Poetry

Osheen MacOscar builds a package:

In this blog series (this and the next blog) I am going to demonstrate how to use Poetry to create a Python package, set up testing infrastructure and install it. I am going to be creating a wrapper around the Fantasy Premier League API and creating a function which can create a weekly league table.

This is a straightforward example of how to create a new Python package and add a function call to it.

Comments closed

Error Handling in PySpark Jobs

Ram Ghadiyaram adds some error handling logic:

In PySpark, processing massive datasets across distributed clusters is powerful but comes with challenges. A single bad record, missing file, or network glitch can crash an entire job, wasting compute resources and leaving you with stack traces that have many lines. 

Spark’s lazy evaluation, where transformations don’t execute until an action is triggered, makes errors harder to catch early, and debugging them can feel like very, very difficult.

Read on for five patterns that can help with error handling in PySpark.

Comments closed

Choosing between Data Scalers in a Data Science Project

Bala Pirya C performs a comparison:

In this article, you will learn how MinMaxScaler, StandardScaler, and RobustScaler transform skewed, outlier-heavy data, and how to pick the right one for your modeling pipeline.

Topics we will cover include:

  • How each scaler works and where it breaks on skewed or outlier-rich data
  • A realistic synthetic dataset to stress-test the scalers
  • A practical, code-ready heuristic for choosing a scaler

Read on to learn more about each of these three scaler types, the use cases that best fit each of them, and even a flow chart at the end.

Comments closed

Cross-Validation and Time Series Data

Vlad Johnson takes us through a technique to test time series results:

Time series modeling, compared to traditional nontemporal modeling, presents unique challenges in ensuring that models generalize well to future, unseen data. One key methodology to address these challenges is cross-validation.

Time series data inherently contains temporal dependencies — observations are ordered in time, and future values may depend on past trends. This structure makes it challenging to estimate how well a model will perform on new, unseen data.

Click through for an explanation of cross-validation, why this becomes challenging when you have time series data (or other serially correlated data), and tips to resolve this challenge.

Comments closed