Press "Enter" to skip to content

Category: Python

Exploring the Area under the ROC Curve

Aayush Srivastava takes us through one of the classics of classification:

In the realm of machine learning classification, model evaluation is an essential step to assess the performance and effectiveness of various algorithms. One widely-used tool for this purpose is the Area Under the Receiver Operating Characteristic Curve (AUC-ROC curve). In this blog, we will delve into the significance of the AUC-ROC curve, how it is calculated, and why it is an invaluable metric for evaluating classification models.

In this article, we will discuss the performance metrics used in the classification and also explore the implications of using two, namely AUC and ROC. Here is an overview of the important points that we will discuss in the article. 

The fun anecdote around ROC curves is that their name actually makes sense if you know the origin: it came out of the British army in World War II, where they tracked how their radar operators classified blips as German aircraft or noise (e.g., flocks of birds). The radar receiver operators had certain characteristics, where some were more effective at separating actual threats from noise, hence the Receiver Operating Characteristic curve.

Comments closed

Parallelizing Notebook Runs in Microsoft Fabric via Python

Sandeep Pawar kicks off multiple notebooks at once:

The notebook class in mssparkutils has two methods to run notebooks – run and runMultiple . run allows you to trigger a notebook run for one single notebook. Mim wrote a nice blog to show how to use it and its usefulness.

runMultiple , on the other hand, allows you to create a Direct Acyclic Graph (DAG) of notebooks to execute notebooks in parallel and in specified order, similar to a pipeline run except in a notebook.

Read on to learn more about the advantages of this latter approach as well as how you can do it.

Comments closed

Trying out Data Wrangler

Ginger Grant tries out a feature in Microsoft Fabric:

The second element in my series on new Fabric Features is Data Wrangler. Data Wrangler is an entirely new feature found inside of the Data Engineering and Machine Learning Experience of Fabric. It was created to help analyze data in a lakehouse using Spark and generated code. You may find that there’s a lot of data in the data lake that you need to evaluate to determine how you might incorporate the data into a data model. It’s important to examine the data to evaluate what the data contains. Is there anything missing? Incorrectly data typed? Bad Data? There is an easy method to discover what is missing with your data which uses some techniques commonly used by data scientists. Data Wrangler is used inside of notebooks in the Data Engineering or Machine Learning Environments, as the functionality does not exist within the Power BI experience.

Click through to see how it works. I liken it to Power Query for people who don’t like Python.

Comments closed

Canceling a Power BI Dataflow Gen2 Refresh

Sandeep Pawar has a script for us:

At the time of writing this blog, it is not possible to cancel a Dataflow Gen2 (DFg2) refresh using the UI. This is a temporary limitation that I expect will be resolved soon. DFg2 can be resource intensive, and if the refresh takes longer than expected, it may consume a significant amount of CUs. Thankfully, you can use the Power BI Rest API to cancel it. My friend Alex Powers already has a PowerShell script that you can use. You can also use the Power BI VS Code extension by Gerhard Brueckl.

But I would like to show you how you can do this using the PowerBIRestClient in the latest version of Semantic-Link (v0.5.0).

Read on to see what this Python script does and how you can use it.

Comments closed

An Overview of Polars

Dylan Jones talks about a Rust-based data frame library:

Polars is a high-performance DataFrame library implemented in Rust, and can be used with Rust natively or via its Python wrapper. It is designed to handle large datasets with ease, providing an user-friendly interface for data manipulation and analysis. The library offers two modules: polars-core for the core functionality, and polars-io for input/output operations, allowing you to read and write data in various formats such as CSV, JSON, Parquet, Delta and more.

Read on to see how it works in Python compared to Pandas, as well as some speed comparisons.

Comments closed

Notes on Linear Markov Chains

John Mount has some thoughts for us:

I want to collect some “great things to know about linear Markov chains.”

For this note we are working with a Markov chain on states that are the integers 0 through k (k > 0). A Markov chain is an iterative random process with time tracked as an increasing integer t, and the next state of the chain depending only on the current (soon to be previous) state. For our linear Markov chain the only possible next states from state i are: i (called a “self loop” when present), i+1 (called up or right), and i-1 (called down or left). In no case does the chain progress below 0 or above k.

Click through for notes on two variants of this sort of linear Markov chain, as well as a pair of appendices containing derivation notes and Python code.

Comments closed

Making REST API Calls against Microsoft Fabric

Sandeep Pawar digs into the REST API:

Accessing Fabric REST endpoints in Fabric notebooks was already easy but it became easier and straightforward with semantic-link version 0.4.0. You can use the FabricRestClient class from sempy to set up a REST client and call the APIs. Authentication is automatically managed for you.

Click through to see how it works, as well as some warnings or things to keep in mind along the way.

Comments closed

Creating Charts in Microsoft Fabric Notebooks using Vega

Phil Seamark tries out Vega in a Microsoft Fabric notebook:

I recently needed to generate a quick visual inside a Microsoft Fabric notebook. After a little internet searching, I found there are many good quality charting libraries in Python, however it was going to take too long to figure out how to create a very specific type of chart.

This is where Vega came to the rescue. The purpose of this short article is to share a very simple implementation of generating a Vega chart using a Microsoft Fabric notebook.

Click through for the example code.

Comments closed

Accessing PostgreSQL from Python

Semab Tariq connects to Postgres:

Psycopg2, a PostgreSQL adapter for Python, implements the Python Database API Specification v2.0, acting as a bridge between Python applications and PostgreSQL databases. It leverages the libpq library, the official PostgreSQL C interface, to facilitate efficient communication. Psycopg2 provides a robust set of features, including transaction management, and support for PostgreSQL-specific data types. Its implementation of the Python DB API ensures seamless integration, enabling developers to execute SQL queries and transactions with precision in Python applications.

Read on to see how it works using a variety of examples.

Comments closed