Press "Enter" to skip to content

Category: Python

Sending Messages to Event Hub via Python

Kiril Nikolov has a message for us:

Recently I needed to create an Azure Function app that would connect to an API and send data to an Event Hub as part of a real-time data streaming solution.

Azure functions are the perfect connectivity option for a task like this, allowing you to focus on the trigger and the resulting output message you want to capture in the event stream, while Azure handles the maintenance of the cloud infrastructure and hosting to run it.

Azure functions can be written in multiple languages. I needed to write mine in python, meaning that I had to set up a configuration file to connect to the Event Hub (as I will explain in further detail below).

Click through to see how it all works.

Comments closed

Using Shiny on Python

David Saipe crosses the streams:

As someone who has zero experience using Shiny in R, the recent announcement that the framework had been made available to Python users inspired an opportunity for me to learn a new concept from a different perspective to most of my colleagues. I have been tasked with writing a Python related blog post, and having spent the past few weeks carrying out an analysis of Jumping Rivers’ Twitter data (@jumping_uk), creating a dashboard to display some of my findings and then writing about it seemed like a nice way to cap off my 6-week summer placement at Jumping Rivers.

This post will take you through some of the source code for the dashboard I created, whilst I provide a bit of context for the Twitter project itself. For a more bare-bones tutorial on using Shiny for Python, you can check out another recent Jumping Rivers blog post here. I suggest reading this first.

Read on to see how you can get started with Shiny on Python and what David thinks about the experience.

Comments closed

Removing Backgrounds from Images

Brendan Tierney focuses on the subject at hand:

There are a number of methods available for preparing images for input to a variety of purposes. For example, for input to deep learning, other image processing models/applications/systems, etc. But sometimes you just need a quick tool to perform a certain task. An example of this is I regularly have to edit images to extract just a certain part of it, or to filter out all the background colors and/or objects etc. There are a a variety of tools available to help you with this kind of task. For me, I’m a Mac user, so I use the instant alpha feature available in some of the Mac products. But what if you are not a Mac user, what can you use.

I’ve recently come across a very useful Python library that takes all or most of the hard work out of doing such tasks, and has proved to be extremely useful for some demos and projects I’ve been working on. The Python library I’m using is remgb (Remove Background). It isn’t perfect, but it does a pretty good job and only in a small number of modified images, did I need to do some additional processing.

Click through to see how the tool works, as well as some cases it doesn’t quite get correct.

Comments closed

Installing Third-Party WHL Packages in Synapse with DEP

Sabyasachi Samaddar walks through what I consider a real difficulty:

It is really challenging when you need to install third-party .whl packages into a DEP-enabled Azure Synapse Spark Instance.

According to the documentation, https://learn.microsoft.com/en-us/azure/synapse-analytics/spark/apache-spark-azure-portal-add-librar… Installing packages from PyPI is not supported within DEP-enabled workspaces. Hence we cannot just upload the .whl packages into the workspace. We need to upload all the dependencies along with the .whl package and it will be an offline installation. Now Synapse spark clusters come with in-built packages and hence we may find some conflicts when we try to install some third-party packages.

Read on to see what you need to do.

Comments closed

Appending Rows to a Pandas DataFrame

Matt Eland acquires some rows that fell off a truck:

Recently I was working on comparing the performance of different machine learning models and I wanted to add entries to a Pandas DataFrame as I evaluated each model. What I found was that adding new rows to a Pandas DataFrame was a little harder than I suspected and required some mild searching, so I wanted to preserve the two solutions I found here in case it helps someone else.

Read on for those two solutions, though as Matt points out, only one of them is a good solution.

Comments closed

Bitemporal Modeling and Running Totals

John Mount solves a running total problem in Python:

An example of this is wanting to know any many reservations for a San Francisco Symphony concert scheduled for December 4th 2022 are known to have been made by October 22nd 2022. This could be used as part of an attendance demand model that is evaluated on October 22nd 2022. The “fifty-cent word” for this is “bitemporal” modeling or data.

As I read through the solution, my initial thought is that, if the data is in a relational database, a running total operation SUM(reservation_count) OVER (PARTITION BY target_date ORDER BY action_date ROWS BETWEEN UNBOUNDED PRECEDING TO CURRENT ROW) would form the basis of a solution. Still, this is an interesting exercise in translating a SQL operation into equivalent Python and just how much we get to take for granted.

Comments closed

Fine-Tuning Hugging Face for Named Entity Recognition in Japanese

Tsuyoshi Matsuzaki tries out a named entity recognition project with the Hugging Face library:

Now a lot of AI companies (such as, OpenAI, NLP Cloud, Google, NVIDIA, etc) are providing pre-trained large language models including methods that tune to enable models trained. Among such tools and framework, HuggingFace is widely used and providing over 20,000 transformer-based models.

In this post, I’ll show you brief fine-tuned example of transformer models in Hugging Face for your beginning.
In the last part of this post, I’ll also optimize training with DeepSpeed which is well integrated with HuggingFace transformers.

Click through for the results of this analysis.

Comments closed

Working with Transformer Models for Machine Translation

Stefania Cristina continues a series on transformer models. First up is plotting loss curves:

We have previously seen how to train the Transformer model for neural machine translation. Before moving on to inferencing the trained model, let us first explore how to modify the training code slightly, in order to be able to plot the training and validation loss curves that can be generated during the learning process. 

The training and validation loss values provide important pieces of information, because they allow us to have a better insight on how the learning performance is changing over the number of epochs, and help us diagnose any problems with learning that can lead to an underfit or an overfit model. They will also inform us about the epoch at which to use the trained model weights at the inferencing stage.

Then we get to try it out:

We have seen how to train the Transformer model on a dataset of English and German sentence pairs, as well as how to plot the training and validation loss curves in order to diagnose the model’s learning performance and decide at which epoch to inference the trained model. We are now ready to inference the trained Transformer model for the purpose of translating an input sentence.

In this tutorial, you will discover how to inference the trained Transformer model for neural machine translation. 

Click through for the results and to see exactly why there’s so much computational effort dumped into high-end trained models.

Comments closed

Training a Language Transformer Model

Stefania Cristina continues a series on building a language transformer:

We have put together the complete Transformer model, and now we are ready to train it for neural machine translation. We shall be making use of a training dataset for this purpose, which contains short English and German sentence pairs. We will also be revisiting the role of masking in computing the accuracy and loss metrics during the training process. 

In this tutorial, you will discover how to train the Transformer model for neural machine translation. 

Read on for the process, including a lot of code.

Comments closed

Kernel SHAP in R and Python

Michael Mayer and Christian Lorentzen team up:

SHAP is one of the most used model interpretation technique in Machine Learning. It decomposes predictions into additive contributions of the features in a fair way. For tree-based methods, the fast TreeSHAP algorithm exists. For general models, one has to resort to computationally expensive Monte-Carlo sampling or the faster Kernel SHAP algorithm. Kernel SHAP uses a regression trick to get the SHAP values of an observation with a comparably small number of calls to the predict function of the model. Still, it is much slower than TreeSHAP.

Read on to see how to do this in both R and Python. With libraries the way they are, the code is very similar and the results are basically the same.

Comments closed