Python – Page 13 – Curated SQL

A Primer on Pandas

Published 2024-10-16 by Kevin Feasel

Have you heard about Pandas in Python? It is widely used open-source library for analyzing and manipulating data in the Python programming language. Let’s explore it with use cases and examples.

Click through for an overview of the library. Pandas isn’t the quickest performer as your data sets get large, but for ease of use on moderately-sized datasets (up to hundreds of thousands of rows, or maybe millions if you manage things well), it does a good job.

Comments closed

Exploring Semantic Model Relationships with Sempy

Published 2024-10-09 by Kevin Feasel

Prathy Kamasani builds a graph:

Understanding the relationships between datasets is crucial in data analytics, especially in the world of self-service BI. Sempy, a Python library unique to Microsoft Fabric, allows users to visualise these relationships seamlessly. This post explores using Sempy to visualise semantic model relationships and view them in a Power BI Report. Viewing them in Notebook is easy and has been documented on MS Docs.

Click through for a notebook and explanation of the underlying code.

Comments closed

An Overview of k Nearest Neighbors

Published 2024-10-07 by Kevin Feasel

Harris Amjad explains a common algorithm for classification:

It so happens that given the hype of Machine Learning (ML) and especially Large Language Models these days, there is a considerable proportion of those who wish to understand how these systems work from scratch. Unfortunately, more often than not, the interest fades away quickly as learners jump to complicated algorithms like neural networks and transformers first, without giving heed to traditional ML algorithms that paved the foundation for these advanced algorithms in the first place. In this tip, we will introduce and implement the K-Nearest Neighbors model in Python. Although it is quite old, it remains very popular due to its simplicity and intuitiveness.

Click through to learn more about this algorithm, including an implementation from scratch in Python.

Comments closed

An Overview of LightGBM

Published 2024-10-03 by Kevin Feasel

Vinod Chugani continues a series on tree-based classification techniques:

LightGBM is a highly efficient gradient boosting framework. It has gained traction for its speed and performance, particularly with large and complex datasets. Developed by Microsoft, this powerful algorithm is known for its unique ability to handle large volumes of data with significant ease compared to traditional methods.

In this post, we will experiment with LightGBM framework on the Ames Housing dataset. In particular, we will shed some light on its versatile boosting strategies—Gradient Boosting Decision Tree (GBDT) and Gradient-based One-Side Sampling (GOSS). These strategies offer distinct advantages. Through this post, we will compare their performance and characteristics.

Read on to learn more about LightGBM as an algorithm, as well as how to use it.

Comments closed

Recovering Power BI Reports You Cannot Download

Published 2024-10-03 by Kevin Feasel

Kurt Buhler grabs a report:

Below are some reasons why you might not be able to download your Power BI report or model from a workspace:

The report was created in the service:

Someone created the report manually (using the User Interface) and connects to a model in another workspace.

Someone created the report programmatically (for instance, using the REST APIs).

Power BI created the report automatically (for instance, it copied the report to a workspace that belongs to a later stage in a deployment pipeline)

You used the REST APIs to re-bind a report (changed which model it connects to as a data source).

The model has incremental refresh enabled.

The model uses automatic aggregations.

The model was modified via an XMLA endpoint.

Other scenarios described in the limitations in the Microsoft documentation.

When you encounter this scenario, you see something like the following image, which shows the Download this file option greyed out from the File menu of the Power BI report.

Read on to see how you can nonetheless recover these published reports using the semantic-link-labs library.

Comments closed

Plotting the ROC Curve in Microsoft Fabric

Published 2024-10-03 by Kevin Feasel

Tomaz Kastrun gets plotting:

ROC (Receiver Operation Characteristics) – curve is a graph that shows how classifiers performs by plotting the true positive and false positive rates. It is used to evaluate the performance of binary classification models by illustrating the trade-off between True positive rate (TPR) and False positive rate (FPR) at various threshold settings.

Read on to see how you can generate one in a Microsoft Fabric notebook. Tomaz also plots a density function for additional fun.

Comments closed

Programmatic Power BI Report Modification via semantic-link-labs

Published 2024-10-02 by Kevin Feasel

Kurt Buhler makes a change:

Whether building reports in Power BI Desktop or in the web browser via the Power BI service, you have limited options to batch or streamline changes. Put another way; it’s tedious and slow to make many small changes to one or more Power BI reports. It’s also easy to make mistakes

When initially designing or building a report, this is not so much of a problem. Unless you’re using a template, you want to control report layout and formatting, yourself. However, certain changes can be little more than a waste of time. Some examples include:

Replacing fields when there’s a broken reference due to i.e. renaming a model measure or column.

Swapping one measure or column for another in the report

Changing visual container styles, like background, border, and shadow/glow.

Changing text or text styles across multiple visuals, pages, or reports.

Changing chart formatting (like color) or properties (like edit interactions) across multiple visuals, pages, or reports.

Read on to see how you can make some of these changes in Python code using the semantic-link-labs library.

Comments closed

Managing Power BI Assets with semantic-link-labs

Published 2024-10-01 by Kevin Feasel

Kurt Buhler takes us through a Python library:

Thus far, the part of Microsoft Fabric that I’ve personally found the most interesting is not Copilot, Direct Lake, or its data warehousing capabilities, but a combination of notebooks and simple file/table storage via Lakehouses. Specifically, the library semantic link and its “expansion pack” semantic-link-labs, spearheaded by Michael Kovalsky. These tools help you build, manage, use, and audit the various items in Fabric from a Python notebook, including Power BI semantic models and reports.

Semantic-link-labs provide a lot of convenient functions that you can use to automate and streamline certain tasks during Power BI development; both of models and reports. For me, I’m particularly interested in the reporting functionalities, because this is where I typically find that I lose the most time, and because there is a drought of tools to address this area.

Read the whole thing.

Comments closed

Handling Missing Data with XGBoost

Published 2024-09-30 by Kevin Feasel

Vinod Chugani is missing a few data points:

XGBoost has gained widespread recognition for its impressive performance in numerous Kaggle competitions, making it a favored choice for tackling complex machine learning challenges. Known for its efficiency in handling large datasets, this powerful algorithm stands out for its practicality and effectiveness.

In this post, we will apply XGBoost to the Ames Housing dataset to demonstrate its unique capabilities. Building on our prior discussion of the Gradient Boosting Regressor (GBR), we will explore key features that differentiate XGBoost from GBR, including its advanced approach to managing missing values and categorical data.

Read on to see how it fares.

Comments closed

Boosting versus Bagging in Tree Models

Published 2024-09-27 by Kevin Feasel

Vinod Chugani compares two techniques for working with trees:

Ensemble learning techniques primarily fall into two categories: bagging and boosting. Bagging improves stability and accuracy by aggregating independent predictions, whereas boosting sequentially corrects the errors of prior models, improving their performance with each iteration. This post begins our deep dive into boosting, starting with the Gradient Boosting Regressor. Through its application on the Ames Housing Dataset, we will demonstrate how boosting uniquely enhances models, setting the stage for exploring various boosting techniques in upcoming posts.

Read on for more information. The neat part about the “boosting versus bagging” debate is that both techniques are quite useful. Although boosting (via algorithms like XGBoost or LightGBM) is the more popular technique, bagging (random forest) is extremely powerful in its own right.

Comments closed

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

Category: Python