Press "Enter" to skip to content

Category: Python

The Power of Virtual Environments in Python

I have a new video:

In this video, I explain why virtual environments are such an important concept in Python and why you should generally be using them. I also talk about virtual environments versus Docker containers and how these are not mutually exclusive.

It took me a while to understand why virtual environments make sense, and I think part of the difficulty in adapting to this mental model was that I was used to the .NET mechanism for package management: per-project library installation. Sure, there was the Global Assembly Cache (GAC) in .NET Framework and that had similar problems to installing packages in base Python installations, but we didn’t use it that often. Or at least, I’ve sublimated however many hours of pain I fought the GAC to the point that I don’t remember them anymore.

Comments closed

Fine-Tuning a DistilBERT Model for Question Answering

Muhammad Asad Iqbal Khan builds upon a simple model:

The transformers library provides a clean and well-documented interface for many popular transformer models. Not only it makes the source code easier to read and understand, it also provided a standardize way to interact with the model. You have seen in the previous post how to use a model such as DistilBERT for natural language processing tasks. In this post, you will learn how to fine-tune the model for your own purpose. This expands the use of the model from inference to training. Specifically, you will learn:

  • How to prepare the dataset for training
  • How to train a model using a helper library

DistilBERT is a major simplification of BERT, but it comes with the advantage that it’s very easy to train on modest hardware and performance is in the same realm of acceptability as the full BERT model. Switching from DistilBERT to BERT isn’t as easy as just swapping out model classes, though it’s pretty close.

Comments closed

Deploying and Using Custom Python Libraries in Microsoft Fabric

Miles Cole picks up from part one:

This is part 2 of my prior post that continues where I left off. I previously showed how you can use Resource folders in either the Notebook or Environment in Microsoft Fabric to do some pretty agile development of Python modules/libraries.

Now, how exactly can you package up your code to distribute and leverage it across multiple Workspaces or Environment items? How could we acomplish something like the below?

Read on for the answer.

Comments closed

Building a Simple Microservice with Azure Functions

Temidayo Omoniyi takes us through an example of creating a microservice:

Today’s architecture is serverless intensive, with multiple microservices performing a particular task. Industries are beginning to move away from traditional monolithic applications, which have a single large codebase infrastructure handling everything, to an easier microservice approach.

Click through for a primer on serverless architecture, microservices, and how to create a simple Python app that acts as a microservice.

Comments closed

Time-Saving Features in Scikit-Learn

Cornelius Yudha Wijaya describes a half-dozen functions:

For many people studying data science, Scikit-Learn is often the first machine learning library they encounter. It’s because Scikit-Learn offers various APIs that are useful for model development while still being easy for beginners to use.

As helpful as they may be, many features from Scikit-Learn are rarely explored and have untapped potential. This article will explore six lesser-known features that will save you time.

The calibration curve function, in particular, drew my attention, especially as I had written that by hand in the past.

Comments closed

Writing Data into a Microsoft Fabric Lakehouse via Notebook

Stepan Resl writes some code:

Since Lakehouse is one of the key items within Microsoft Fabric, it is important to know how to write data into it in various formats and using different tools. One of the most common tools is notebooks, as they provide great flexibility and speed for development and testing with graphical outputs. In this article, I want to focus primarily on the following types of notebooks:

  • PySpark
  • Python

Click through to see how it works in both notebook types.

Comments closed

Retrieving Microsoft Fabric Items using a Python-Only Notebook

Gilbert Quevauvilliers doesn’t need Spark for this:

This blog below explains how to use a Python only notebook to get all the Fabric items using the Fabric REST API.

NOTE: At the time of this blog post Feb 2025, Dataflow Gen2 is not included in the Fabric items, I am sure it will be there in the future.

NOTE II: This only gets the Fabric Items, which does not include the Power BI Items.

Despite the notes, Gilbert leads off with the main reason why you might want to use this: it takes up approximately 5% of the capacity units that a Spark-based notebook does to perform the same operation.

Comments closed

Local Text Summarization via DistilBart

Muhammad Asad Iqbal Khan summarizes a document:

Text summarization represents a sophisticated evolution of text generation, requiring a deep understanding of content and context. With encoder-decoder transformer models like DistilBart, you can now create summaries that capture the essence of longer text while maintaining coherence and relevance.

In this tutorial, you’ll discover how to implement text summarization using DistilBart. You’ll learn through practical, executable examples, and by the end of this guide, you’ll understand both the theoretical foundations and hands-on implementation details. After completing this tutorial, you will know:

Click through for the article.

Comments closed

Comparing Pandas to Other Libraries for Data Processing

Vidyasagar Machupalli performs a comparison:

As discussed in my previous article about data architectures emphasizing emerging trends, data processing is one of the key components in the modern data architecture. This article discusses various alternatives to Pandas library for better performance in your data architecture. 

Data processing and data analysis are crucial tasks in the field of data science and data engineering. As datasets grow larger and more complex, traditional tools like pandas can struggle with performance and scalability. This has led to the development of several alternative libraries, each designed to address specific challenges in data manipulation and analysis.

This is by no means a comprehensive test, but it does show off quite a few libraries that perform similar actions to Pandas.

Comments closed