Press "Enter" to skip to content

Category: Python

Churn Analysis using Logistic Regression in Python

Daniel Calbimonte takes us through a churn analysis scenario:

This article explains how to analyze the data using Python and perform customer churn analysis to determine why customers stop using a service.

Read on for the article. If you want to dig deeper into churn analysis, I can recommend a book entitled Fighting Churn with Data. Its focus is more on categorical and numerical analysis rather than using statistical classification techniques like logistic regression to identify churn factors. That makes it easier to digest for non-statisticians, especially because most of the code is SQL.

Comments closed

Using the Azure AI Language and Translation Python SDK

Tomaz Kastrun continues a series on Azure AI:

Using SDK options for “Language + Translation” service is

pip install azure-ai-textanalytics==5.2.0

and adding your endpoint in format like: https://yyyyy_azurehub_xxxxxxx.cognitiveservices.azure.com/

and secret to your endpoint. And you will also need the region name (e.g.: west-europe).

Once you’ve set up the necessary credentials, Tomaz shows how easy it is to call the service.

Comments closed

Using the Azure AI Speech Python SDK

Tomaz Kastrun writes some code:

Besides Python Speech SKD there are multiple languages supported with Speech SDK. Python SDK will expose you many of the Speech service capabilities for developing speech-enabled applications. Ideal for scenarios for (near) real-time and non real-time cases by using other Azure services as storage, streams and analytics

Click through for a demonstration.

Comments closed

API Testing with pytest

Xuan Nguyen Truong writes some tests:

API testing is an essential aspect of software development, ensuring that your application’s endpoints are functioning correctly and reliably. In this guide, we’ll introduce you to implement API testing in Python with Pytest and the Requests library.

I’m a big fan of pytest, as it makes testing in Python so much easier. There’s not a lot of ceremony involved in writing tests and it’s easy to see what’s failing during tests.

Comments closed

Delta Tables in Microsoft Fabric with Polars

Sandeep Pawar tries out the Polars library:

The much-anticipated Python notebook in Fabric is finally available and the Fabric users have already developed cool libraries and blogged about the usefulness of these notebooks. Duckdb is everyone’s favorite, but I am a Python guy so here is quick overview of how you can use Polars in the Python notebook.

Polars is an open-source library that uses a Rust engine and supports multi-threaded execution. This means it’s significantly faster than pandas and, in some cases, even faster than Spark. It can efficiently use the limited resources available in Python notebooks (2 cores, 16GB RAM). Polars v1.6 is installed in the default Python notebook environment. So, let’s see how to perform some common operations.

Read on to see how you can load and write out files via Polars.

Comments closed

Determining Power BI Report Fields in Use

Meagan Longoria performs a search:

Have you ever wondered where a certain field is used in a report? Or maybe you need an easy way to find broken field references in a report? Certain 3rd-party tools such as Measure Killer and Power BI Helper (not updated recently) have helped us with this task in the past. But now we can perform this task with a notebook in Fabric!

This is made possible by the Semantic Link Labs Python library. Please note that PBIR format is still in preview at the time of publishing this blog post, so use it at your own risk. Also, this works only on reports published to the Power BI service. Since this notebook is not making any changes to the report, I feel it’s pretty safe to run, but do remember that it uses CUs on your Fabric capacity while you run it.

Read on to see how it works.

Comments closed

Data Visualization in Matplotlib

Rajendra Gupta generates some graphics:

Data analysis requires analysts to handle structured, semi-structured, or unstructured data. Small datasets with few rows and columns are easy to understand. However, as the data complexity increases with many interlinked variables, getting data insights from tabular formatted data becomes challenging. According to a recent study from MIT, the human brain processes an entire image in just 13 milliseconds. Therefore, it is helpful to learn Python and visualization together.

How do we use Python to generate plots from the data to analyze patterns, correlations, and trends? What plots are available, and how do we use them with customizations? Let’s explore them in this tip.

There are a few visualization libraries in Python I prefer over matplotlib, and for static graphics, ggplot2 in R has pretty much everything else beat. But matplotlib is essentially the standard, so it’s important to know.

Comments closed

An Overview of the Naive Bayes Class of Algorithms

Harris Amjad takes us through a rather useful class of algorithms for classification:

As AI and Machine Learning have increased in popularity, especially Large Language Models, more professionals have explored how these systems work. Unfortunately, some put the cart before the horse, where they take on more complex algorithms before learning to pave the foundation, resulting in faded interest in the topic. This tip will introduce a simple probabilistic, yet powerful classifier, the Naïve Bayes Model, and implement it in Python.

I like using the Naive Bayes variants, despite the fact that it is not Bayesian and arguably isn’t very naive. The reason I like to use this class of algorithm is that it’s fast, easy, and gives you a useful baseline for quality. If you need to meet some specific quality threshold (say, accuracy > 85% or F1-score above 0.8), you can get an answer quickly with Naive Bayes. If that answer is anywhere near your threshold, the problem is likely solvable. If your answer is way below the threshold, it’s probably not worth spending the time or compute effort trying out a variety of other algorithms.

Comments closed

Generating Effect Plots in Python and R

MIchael Mayer builds some effect plots:

The plots show different types of feature effects relevant in modeling:

  • Average observed: Descriptive effect (also interesting without model).
  • Average predicted: Combined effect of all features. Also called “M Plot” (Apley 2020).
  • Partial dependence: Effect of one feature, keeping other feature values constant (Friedman 2001).
  • Number of observations or sum of case weights: Feature value distribution.
  • R only: Accumulated local effects, an alternative to partial dependence (Apley 2020).

Click through to see how they both work.

Comments closed

Obtaining VisualIDs for Visuals in a Power BI Report

Sandeep Pawar checks for ID:

Log Analytics and Workspace Monitoring in Fabric logs all the activities of datasets in a workspace. These logs contain dataset, report, visual IDs which the user has to decipher to get the full picture. Dataset, report ids are straightforward but it’s not easy to get visual IDs programmatically. Chris Webb already has a blog on couple of different ways to get the visual IDs. That blog was published in 2022 and in the Fabric world we now have a couple of more options.

Read on for two additional methods you can use.

Comments closed