Press "Enter" to skip to content

Category: Python

Time-Saving Features in Scikit-Learn

Cornelius Yudha Wijaya describes a half-dozen functions:

For many people studying data science, Scikit-Learn is often the first machine learning library they encounter. It’s because Scikit-Learn offers various APIs that are useful for model development while still being easy for beginners to use.

As helpful as they may be, many features from Scikit-Learn are rarely explored and have untapped potential. This article will explore six lesser-known features that will save you time.

The calibration curve function, in particular, drew my attention, especially as I had written that by hand in the past.

Leave a Comment

Writing Data into a Microsoft Fabric Lakehouse via Notebook

Stepan Resl writes some code:

Since Lakehouse is one of the key items within Microsoft Fabric, it is important to know how to write data into it in various formats and using different tools. One of the most common tools is notebooks, as they provide great flexibility and speed for development and testing with graphical outputs. In this article, I want to focus primarily on the following types of notebooks:

  • PySpark
  • Python

Click through to see how it works in both notebook types.

Leave a Comment

Retrieving Microsoft Fabric Items using a Python-Only Notebook

Gilbert Quevauvilliers doesn’t need Spark for this:

This blog below explains how to use a Python only notebook to get all the Fabric items using the Fabric REST API.

NOTE: At the time of this blog post Feb 2025, Dataflow Gen2 is not included in the Fabric items, I am sure it will be there in the future.

NOTE II: This only gets the Fabric Items, which does not include the Power BI Items.

Despite the notes, Gilbert leads off with the main reason why you might want to use this: it takes up approximately 5% of the capacity units that a Spark-based notebook does to perform the same operation.

Leave a Comment

Local Text Summarization via DistilBart

Muhammad Asad Iqbal Khan summarizes a document:

Text summarization represents a sophisticated evolution of text generation, requiring a deep understanding of content and context. With encoder-decoder transformer models like DistilBart, you can now create summaries that capture the essence of longer text while maintaining coherence and relevance.

In this tutorial, you’ll discover how to implement text summarization using DistilBart. You’ll learn through practical, executable examples, and by the end of this guide, you’ll understand both the theoretical foundations and hands-on implementation details. After completing this tutorial, you will know:

Click through for the article.

Leave a Comment

Comparing Pandas to Other Libraries for Data Processing

Vidyasagar Machupalli performs a comparison:

As discussed in my previous article about data architectures emphasizing emerging trends, data processing is one of the key components in the modern data architecture. This article discusses various alternatives to Pandas library for better performance in your data architecture. 

Data processing and data analysis are crucial tasks in the field of data science and data engineering. As datasets grow larger and more complex, traditional tools like pandas can struggle with performance and scalability. This has led to the development of several alternative libraries, each designed to address specific challenges in data manipulation and analysis.

This is by no means a comprehensive test, but it does show off quite a few libraries that perform similar actions to Pandas.

Leave a Comment

Microsoft Fabric Shortcuts and Lakehouse Maintenance

Dennes Torres has a public service announcement:

I wrote about lakehouse maintenance before, about multiple lakehouse maintenancespublished videos about this subject and provided sample code about it.

However, there is one problem: All the maintenance execution should be avoided over shortcuts.

The tables require maintenance in their original place. According to our solution advances, we start using shortcuts, lots of them. Our maintenance code should always skip shortcuts and make the maintenance only on the tables.

Click through to see how you can differentiate shortcuts from actual tables and write code to avoid shortcuts.

Leave a Comment

Trying out fabric-cicd

Kevin Chant tries a Python package:

In this post I want to cover my initial tests of fabric-cicd. In order to provide some tips for those looking to work with this new offering.

Just so that everybody is aware, fabric-cicd is a Python library that allows you to perform CI/CD of various Microsoft Fabric items into Microsoft Fabric workspaces. At this moment in time there is a limited number of supported item types. However, that list is increasing.

Read on for the test. It currently supports a limit amount of functionality, but it looks promising.

Comments closed

Migrating or Copying a Semantic Model across Microsoft Fabric Workspaces

Sandeep Pawar makes a move:

Here is a quick script to copy a semantic model from one workspace to another in the same tenant, assuming you are contributor+ in both the workspaces. I tested this for a Direct Lake model but should work for any more other semantic model. This just copies the metadata (not the data in the model) so be sure to set up other configurations (RLS members, refresh schedule, settings etc.). That can also be changed programmatically, thanks to Semantic Link Labs, but I will cover that in a future post.

Read on for the script, as well as an update from Sandeep on how you can do this even more easily.

Comments closed