Press "Enter" to skip to content

Category: Python

Switching between Python and PySpark Notebooks in Fabric

Sandeep Pawar wants to save some money:

File this under a test I have been wanting to do for some time. If I am exploring some data in a Fabric notebook using PySpark, can I switch between Python and PySpark engines with minimal code changes in an interactive session? The goal is to use the Python notebook for some exploration or use existing PySpark/SparkSQL or develop the logic in a low compute environment (to save CUs) and scale it in a distributed Spark environment. Understandably, there will be limitations with this approach given the difference in environments, configs etc., but can it be done?

Read on for the answer, as well as plenty of notes around it.

Comments closed

Scanning Fabric Workspaces via Semantic Link Labs

Sandeep Pawar takes us through the Scanner API:

It’s finally here! Thanks to Michael Kovalsky, one of the most requested & anticipated APIs in now available in Semantic Link Labs (v0.8.10) – the Scanner API. The Scanner API in Fabric Admin REST APIs allows Fabric administrators to retrieve detailed metadata about their organization’s Fabric items, supporting governance and compliance efforts. It provides information such as item names, descriptions, date created, lineage, connection strings etc. It’s not new, we have been using it in Power BI for a long time but in the Fabric world, it’s even more important given the number of items and configurations.

Read on to see what’s available and how this works.

Comments closed

Building and Deploying a Streamlit Data App

Ivan Palomares Carrascosa deploys an app:

This article will navigate you through the deployment of a simple machine learning (ML) for regression using Streamlit. This novel platform streamlines and simplifies deploying artifacts like ML systems as Web services.

I’ll leave aside my aside that linear regression isn’t machine learning. Click through to see how you can build a simple application in approximately 60 lines of code. This example shows off some of the simplicity in Streamlit’s design.

Comments closed

Churn Analysis using Logistic Regression in Python

Daniel Calbimonte takes us through a churn analysis scenario:

This article explains how to analyze the data using Python and perform customer churn analysis to determine why customers stop using a service.

Read on for the article. If you want to dig deeper into churn analysis, I can recommend a book entitled Fighting Churn with Data. Its focus is more on categorical and numerical analysis rather than using statistical classification techniques like logistic regression to identify churn factors. That makes it easier to digest for non-statisticians, especially because most of the code is SQL.

Comments closed

Using the Azure AI Language and Translation Python SDK

Tomaz Kastrun continues a series on Azure AI:

Using SDK options for “Language + Translation” service is

pip install azure-ai-textanalytics==5.2.0

and adding your endpoint in format like: https://yyyyy_azurehub_xxxxxxx.cognitiveservices.azure.com/

and secret to your endpoint. And you will also need the region name (e.g.: west-europe).

Once you’ve set up the necessary credentials, Tomaz shows how easy it is to call the service.

Comments closed

Using the Azure AI Speech Python SDK

Tomaz Kastrun writes some code:

Besides Python Speech SKD there are multiple languages supported with Speech SDK. Python SDK will expose you many of the Speech service capabilities for developing speech-enabled applications. Ideal for scenarios for (near) real-time and non real-time cases by using other Azure services as storage, streams and analytics

Click through for a demonstration.

Comments closed

API Testing with pytest

Xuan Nguyen Truong writes some tests:

API testing is an essential aspect of software development, ensuring that your application’s endpoints are functioning correctly and reliably. In this guide, we’ll introduce you to implement API testing in Python with Pytest and the Requests library.

I’m a big fan of pytest, as it makes testing in Python so much easier. There’s not a lot of ceremony involved in writing tests and it’s easy to see what’s failing during tests.

Comments closed

Delta Tables in Microsoft Fabric with Polars

Sandeep Pawar tries out the Polars library:

The much-anticipated Python notebook in Fabric is finally available and the Fabric users have already developed cool libraries and blogged about the usefulness of these notebooks. Duckdb is everyone’s favorite, but I am a Python guy so here is quick overview of how you can use Polars in the Python notebook.

Polars is an open-source library that uses a Rust engine and supports multi-threaded execution. This means it’s significantly faster than pandas and, in some cases, even faster than Spark. It can efficiently use the limited resources available in Python notebooks (2 cores, 16GB RAM). Polars v1.6 is installed in the default Python notebook environment. So, let’s see how to perform some common operations.

Read on to see how you can load and write out files via Polars.

Comments closed

Determining Power BI Report Fields in Use

Meagan Longoria performs a search:

Have you ever wondered where a certain field is used in a report? Or maybe you need an easy way to find broken field references in a report? Certain 3rd-party tools such as Measure Killer and Power BI Helper (not updated recently) have helped us with this task in the past. But now we can perform this task with a notebook in Fabric!

This is made possible by the Semantic Link Labs Python library. Please note that PBIR format is still in preview at the time of publishing this blog post, so use it at your own risk. Also, this works only on reports published to the Power BI service. Since this notebook is not making any changes to the report, I feel it’s pretty safe to run, but do remember that it uses CUs on your Fabric capacity while you run it.

Read on to see how it works.

Comments closed

Data Visualization in Matplotlib

Rajendra Gupta generates some graphics:

Data analysis requires analysts to handle structured, semi-structured, or unstructured data. Small datasets with few rows and columns are easy to understand. However, as the data complexity increases with many interlinked variables, getting data insights from tabular formatted data becomes challenging. According to a recent study from MIT, the human brain processes an entire image in just 13 milliseconds. Therefore, it is helpful to learn Python and visualization together.

How do we use Python to generate plots from the data to analyze patterns, correlations, and trends? What plots are available, and how do we use them with customizations? Let’s explore them in this tip.

There are a few visualization libraries in Python I prefer over matplotlib, and for static graphics, ggplot2 in R has pretty much everything else beat. But matplotlib is essentially the standard, so it’s important to know.

Comments closed