Press "Enter" to skip to content

Category: Python

An Introduction to Batch Normalization in Neural Networks

Ivan Palomares Carrascosa shows off one technique for optimizing neural networks:

Deep neural networks have drastically evolved over the years, overcoming common challenges that arise when training these complex models. This evolution has enabled them to solve increasingly difficult problems effectively.

One of the mechanisms that has proven especially influential in the advancement of neural network-based models is batch normalization. This article provides a gentle introduction to this strategy, which has become a standard in many modern architectures, helping to improve model performance by stabilizing training, speeding up convergence, and more.

Read on for a quick description of how it works and a demonstration in Keras.

Comments closed

Making XGBoost Run Faster

Ivan Palomares Carrascosa shares a few tips:

Extreme gradient boosting (XGBoost) is one of the most prominent machine learning techniques used not only for experimentation and analysis but also in deployed predictive solutions in industry. An XGBoost ensemble combines multiple models to address a predictive task like classification, regression, or forecasting. It trains a set of decision trees sequentially, gradually improving the quality of predictions by correcting the errors made by previous trees in the pipeline.

In a recent article, we explored the importance and ways to interpret predictions made by XGBoost models (note we use the term ‘model’ here for simplicity, even though XGBoost is an ensemble of models). This article takes another practical dive into XGBoost, this time by illustrating three strategies to speed up and improve its performance.

Read on for two tips to reduce operational load and one to offload it to faster hardware (when possible).

Comments closed

An Introduction to Bayesian Regression

Ivan Palomares Carrascosa covers the concept of Bayesian regression:

In this article, you will learn:

  • The fundamental difference between traditional regression, which uses single fixed values for its parameters, and Bayesian regression, which models them as probability distributions.
  • How this probabilistic approach allows the model to produce a full distribution of possible outcomes, thereby quantifying the uncertainty in its predictions.
  • How to implement a simple Bayesian regression model in Python with scikit-learn.

My understanding is that both Bayesian and traditional regression techniques get you to (roughly) the same place, but the Bayesian approach makes it harder to forget that the regression line you draw doesn’t actually exist and everything has uncertainty.

Comments closed

Time Series Forecasting in Python

Myles Mitchell builds an ARIMA model:

In time series analysis we are interested in sequential data made up of a series of observations taken at regular intervals. Examples include:

  • Weekly hospital occupancy
  • Monthly sales figures
  • Annual global temperature

In many cases we want to use the observations up to the present day to predict (or forecast) the next N time points. For example, a hospital could reduce running costs if an appropriate number of beds are provisioned.

Read on for a primer on the topic, a quick explanation of ARIMA, and a sample implementation using several Python packages.

Comments closed

Time Series Helpers in NumPy

Bala Priya C shares some one-liners:

NumPy’s array operations can help simplify most common time series operations. Instead of thinking step-by-step through data transformations, you can apply vectorized operations that process entire datasets at once.

This article covers 10 NumPy one-liners that can be used for time series analysis tasks you’ll come across often. Let’s get started!

Click through to see the ten in action.

Comments closed

Tips for Working with Pandas

Matthew Mayo has a few tips when working with Pandas for data preparation:

If you’re reading this, it’s likely that you are already aware that the performance of a machine learning model is not just a function of the chosen algorithm. It is also highly influenced by the quality and representation of the data that said model has been trained on.

Data preprocessing and feature engineering are some of the most important steps in your machine learning workflow. In the Python ecosystem, Pandas is the go-to library for these types of data manipulation tasks, something you also likely know. Mastering a few select Pandas data transformation techniques can significantly streamline your workflow, make your code cleaner and more efficient, and ultimately lead to better performing models.

This tutorial will walk you through seven practical Pandas scenarios and the tricks that can enhance your data preparation and feature engineering process, setting you up for success in your next machine learning project.

Click through for those tips and tricks.

Comments closed

A Primer on Bayesian Modeling

Hristo Hristov is speaking my language:

Multivariate analysis in data science is a type of analysis that tackles multiple input/predictor and output/predicted variables. This tip explores the problem of predicting air pollution measured in particulate matter (PM) concentration based on ambient temperature, humidity, and pressure using a Bayesian Model.

Click through for a detailed code sample and explanation.

Comments closed

Using the Tabular Object Model via Semantic Link Labs

Gilbert Quevauvilliers does a bit of connecting:

In this blog post I am going to show you how to use the powerful Semantic Link Labs library for Tabular Object Model (TOM) for semantic model manipulation.

The goal of this blog post is to give you an understanding of how to connect using TOM, then based on the documentation use one of the functions.

Don’t get me wrong the documentation is great, but when implementing it, it works a little differently and I want others to know how to use it, so it can automate and simplify some repetitive tasks.

Read on for the instructions and some of the things you can do with the Semantic Link Labs library in Microsoft Fabric.

Comments closed

Visualizing ML Model Outcomes with Matplotlib

Matthew Mayo shares a few tips:

Visualizing model performance is an essential piece of the machine learning workflow puzzle. While many practitioners can create basic plots, elevating these from simple charts to insightful, elevated visualizations that can help easily tell the story of your machine leanring model’s interpretations and predictions is a skill that sets great professionals apart. The Matplotlib library, the foundational plotting tool in the scientific and computational Python ecosystem, is packed with features that can help you achieve this.

This tutorial provides 7 practical Matplotlib tricks that will help you better understand, evaluate, and present your machine learning models. We’ll move beyond the default settings to create visualizations that are not only aesthetically pleasing but also rich in information. These techniques are designed to integrate smoothly into your workflow with libraries like NumPy and Scikit-learn.

Click through for those tips.

Comments closed

Text Classification with Decision Trees

Ivan Palomares Carrascosa takes us through a simple natural language processing problem and solution:

It’s no secret that decision tree-based models excel at a wide range of classification and regression tasks, often based on structured, tabular data. However, when combined with the right tools, decision trees also become powerful predictive tools for unstructured data, such as text or images, and even time series data.

This article demonstrates how to build decision trees for text data. Specifically, we will incorporate text representation techniques like TF-IDF and embeddings in decision trees trained for spam email classification, evaluating their performance and comparing the results with another text classification model — all with the aid of Python’s Scikit-learn library.

Read on for the demos and to see how three different approaches work.

Comments closed