Press "Enter" to skip to content

Category: Python

Fabric List Connections API in Semantic Link Labs

Sandeep Pawar has an update for us:

In you case you missed it, List Connections Admin API is now live in Fabric. It was shipped in Semantic Link Labs v 0.7.4 a few weeks ago but at the time of the release it was still private. This API returns all the connections set up in the tenant and requires admin privileges. I still can’t find documentation on it so wait for the official details. Note that this API is different from item – list connection API which lists connections used by an item.

Read on to see what you can get from it.

Comments closed

Lexing DAX with PyDAX

Sandeep Pawar reviews a DAX lexer:

The power of open-source and GenAI. Klaus Jürgen Folz recently open-sourced the PyDAX library, which parses DAX expressions to extract or remove comments, and identify referenced columns and measures. I used that library to create some demos for myself and then shared the notebook along with instructions with Replit agents to build an app for me.. 15 minutes & 3 prompts later I had a fully functional app. Give it a try : https://daxparser.replit.app/

Read on to learn more, including why I referred to PyDAX as a “lexer” and a few more notes of relevance.

Comments closed

Map and FlatMap in PySpark

Vipul Kumar does a bit of work with resilient distributed datasets:

PySpark, the Python API for Apache Spark, is widely used for big data processing and distributed computing. It enables data engineers and data scientists to efficiently process large datasets using resilient distributed datasets (RDDs) and DataFrames. Two commonly used transformations in PySpark are map() and flatMap(). These functions allow users to perform operations on RDDs and are pivotal in distributed data processing.

In this blog, we’ll explore the key differences between map() and flatMap(), their use cases, and how they can be applied in PySpark.

The DataFrame approach has all but obviated having developers use the original Hadoop-like map-reduce approach to writing code in Spark. Even so, I do think it’s useful to know how it all works.

Comments closed

A Survey of Predictive Analytics Techniques

Akmal Chaudhri tries a bunch of things:

In this short article, we’ll explore loan approvals using a variety of tools and techniques. We’ll begin by analyzing loan data and applying Logistic Regression to predict loan outcomes. Building on this, we’ll integrate BERT for Natural Language Processing to enhance prediction accuracy. To interpret the predictions, we’ll use SHAP and LIME explanation frameworks, providing insights into feature importance and model behavior. Finally, we’ll explore the potential of Natural Language Processing through LangChain to automate loan predictions, using the power of conversational AI.

Click through for the notebook, as well as an overview of what the notebook includes. I don’t particularly like word clouds as the “solution” in the BERT example, though without real data to perform any sort of NLP, there’s not much you can meaningfully do.

Comments closed

A Primer on Pandas

Rajendra Gupta talks about Pandas:

Have you heard about Pandas in Python? It is widely used open-source library for analyzing and manipulating data in the Python programming language. Let’s explore it with use cases and examples.

Click through for an overview of the library. Pandas isn’t the quickest performer as your data sets get large, but for ease of use on moderately-sized datasets (up to hundreds of thousands of rows, or maybe millions if you manage things well), it does a good job.

Comments closed

Exploring Semantic Model Relationships with Sempy

Prathy Kamasani builds a graph:

Understanding the relationships between datasets is crucial in data analytics, especially in the world of self-service BI. Sempy, a Python library unique to Microsoft Fabric, allows users to visualise these relationships seamlessly. This post explores using Sempy to visualise semantic model relationships and view them in a Power BI Report. Viewing them in Notebook is easy and has been documented on MS Docs.

Click through for a notebook and explanation of the underlying code.

Comments closed

An Overview of k Nearest Neighbors

Harris Amjad explains a common algorithm for classification:

It so happens that given the hype of Machine Learning (ML) and especially Large Language Models these days, there is a considerable proportion of those who wish to understand how these systems work from scratch. Unfortunately, more often than not, the interest fades away quickly as learners jump to complicated algorithms like neural networks and transformers first, without giving heed to traditional ML algorithms that paved the foundation for these advanced algorithms in the first place. In this tip, we will introduce and implement the K-Nearest Neighbors model in Python. Although it is quite old, it remains very popular due to its simplicity and intuitiveness.

Click through to learn more about this algorithm, including an implementation from scratch in Python.

Comments closed

An Overview of LightGBM

Vinod Chugani continues a series on tree-based classification techniques:

LightGBM is a highly efficient gradient boosting framework. It has gained traction for its speed and performance, particularly with large and complex datasets. Developed by Microsoft, this powerful algorithm is known for its unique ability to handle large volumes of data with significant ease compared to traditional methods.

In this post, we will experiment with LightGBM framework on the Ames Housing dataset. In particular, we will shed some light on its versatile boosting strategies—Gradient Boosting Decision Tree (GBDT) and Gradient-based One-Side Sampling (GOSS). These strategies offer distinct advantages. Through this post, we will compare their performance and characteristics.

Read on to learn more about LightGBM as an algorithm, as well as how to use it.

Comments closed

Recovering Power BI Reports You Cannot Download

Kurt Buhler grabs a report:

Below are some reasons why you might not be able to download your Power BI report or model from a workspace:

  • The report was created in the service:
    • Someone created the report manually (using the User Interface) and connects to a model in another workspace.
    • Someone created the report programmatically (for instance, using the REST APIs).
    • Power BI created the report automatically (for instance, it copied the report to a workspace that belongs to a later stage in a deployment pipeline)
  • You used the REST APIs to re-bind a report (changed which model it connects to as a data source).
  • The model has incremental refresh enabled.
  • The model uses automatic aggregations.
  • The model was modified via an XMLA endpoint.
  • Other scenarios described in the limitations in the Microsoft documentation.

When you encounter this scenario, you see something like the following image, which shows the Download this file option greyed out from the File menu of the Power BI report.

Read on to see how you can nonetheless recover these published reports using the semantic-link-labs library.

Comments closed

Plotting the ROC Curve in Microsoft Fabric

Tomaz Kastrun gets plotting:

ROC (Receiver Operation Characteristics) – curve is a graph that shows how classifiers performs by plotting the true positive and false positive rates. It is used to evaluate the performance of binary classification models by illustrating the trade-off between True positive rate (TPR) and False positive rate (FPR) at various threshold settings.

Read on to see how you can generate one in a Microsoft Fabric notebook. Tomaz also plots a density function for additional fun.

Comments closed