Press "Enter" to skip to content

Category: Python

mssparkutils now notebookutils and Validating DAGs in Fabric

Sandeep Pawar gives us two quick hits:

First, if you haven’t noticed mssparkutils has been officially renamed to notebookutils. Check out the official documentation for details. Be sure to use/update your notebooks to notebookutils.

Read on for a pair of notes around this name change, as well as some capabilities to validate DAGs when using runMultiple to orchestrate multiple notebook executions.

Comments closed

Databricks Notebook Package Installation and Variables

Chen Hirsh diagnoses a problem:

A friend called to ask for my help with a weird issue. In a Databricks notebook using Python, he declares and assigns a variable in the first cell. Something like that:

my_var = 1

He then runs the rest of the notebook, and somewhere along the way, tries to use this variable, and gets this message:

NameError: name 'my_var' is not defined

Going back to cell 1, and checking the value of my_var, he gets the same error.

Read on for the root cause of the issue, as well as a pair of helpful tips from Chen.

Comments closed

Interpreting Linear Regression Model Coefficients

Vinod Chugani looks at a linear regression:

Linear regression models are foundational in machine learning. Merely fitting a straight line and reading the coefficient tells a lot. But how do we extract and interpret the coefficients from these models to understand their impact on predicted outcomes? This post will demonstrate how one can interpret coefficients by exploring various scenarios. We’ll delve into the analysis of a single numerical feature, investigate the role of categorical variables, and unpack the complexities introduced when these features are combined. Through this exploration, we aim to equip you with the skills needed to leverage linear regression models effectively, enhancing your analytical capabilities across different data-driven domains.

Click through for details, with examples in Python.

Comments closed

Speech to Text with Streamlit and Azure AI

I have a new video:

In this video, I show how we can integrate with the Azure AI Services Speech service, using two different methods to capture speech from the microphone via our Streamlit application and submit that to Azure OpenAI.

Check out the video and final set of code. There’s an intermediate set of code for detecting a single utterance. But I think the final product works out pretty well.

Comments closed

A Primer on One-Hot Encoding

Vinod Chugani does a bit of data modeling:

Preparing categorical data correctly is a fundamental step in machine learning, particularly when using linear models. One Hot Encoding stands out as a key technique, enabling the transformation of categorical variables into a machine-understandable format. This post tells you why you cannot use a categorical variable directly and demonstrates the use One Hot Encoding in our search for identifying the most predictive categorical features for linear regression.

Read the whole thing.

Comments closed

Using AI Skills as Cell Magics in Microsoft Fabric Notebooks

Sandeep Pawar takes a look at a new preview capability:

The public preview of AI Skills in Microsoft Fabric was announced yesterday. AI Skills allows Fabric developers to create their own GenAI experience using data in the lakehouse. Unlike Copilot, which is an AI assistant, AI Skills lets users build a validated Q&A application that queries lakehouse data by converting natural language questions into T-SQL queries. It’s only available in paid F64+ SKUs. You can watch the below video for Copilot, AI Skills and Gen AI experiences in Fabric:

Read on for more details on how it works.

Comments closed

Tips for Hyperparameter Tuning

Bala Priya C shares some tips and techniques:

If you’re familiar with machine learning, you know that the training process allows the model to learn the optimal values for the parameters—or model coefficients—that characterize it. But machine learning models also have a set of hyperparameters whose values you should specify when training the model. So how do you find the optimal values for these hyperparameters?

You can use hyperparameter tuning to find the best values for the hyperparameters. By systematically adjusting hyperparameters, you can optimize your models to achieve the best possible results.

This tutorial provides practical tips for effective hyperparameter tuning—starting from building a baseline model to using advanced techniques like Bayesian optimization. Whether you’re new to hyperparameter tuning or looking to refine your approach, these tips will help you build better machine learning models. Let’s get started.

Read on for those techniques. Incidentally, one of my “Old man yells at clouds” takes is that I dislike the existence of hyperparameters and consider them a modeling failure, essentially telling the implementer to do part of the researcher’s work. Knowing that they are necessary to work with for so many algorithms, there’s nothing to do but learn how to work with them effectively, but there’s a feel of outsourcing the hard work to users that I don’t like about the process. For that reason, I have extra respect for algorithms that neither need nor offer hyperparameters.

Comments closed

Chat with Your Own Data in Streamlit and Azure Open AI

I have a new video:

In this video, I show how we can make a GPT-4 deployment aware of our own custom data, without needing to fine-tune the model. I talk about meta prompts and the Retrieval Augmented Generation (RAG) pattern, and then show how you can set this up using Azure AI Search and Azure OpenAI. Then, I bring it back to Streamlit and give users the option between chatting with a generic GPT-4 deployment and chatting over custom data.

I try to make my videos 10 minutes in length. They usually end up at 15-18 minutes. This one clocks in at more than 30 minutes and there’s very little fluff.

Comments closed

Defining a OneLake Filesystem using fsspec

Sandeep Pawar looks at fsspec:

I mentioned on X the other day that, like other filesystem backends such as S3 and GCS, you can use fsspec to define the OneLake filesystem too. In this blog, I will explain how to define it and why it’s important to know about it.

Click through for the details on what fsspec is, why it’s important, and what benefits you can get in Microsoft Fabric as a result of its support of fsspec.

Comments closed

Chat with Azure OpenAI in Streamlit

I have a new video:

In this video, I show how we can integrate an Azure OpenAI GPT-4 model into our Streamlit dashboard. Along the way, I also show off how easy it is to create multiple pages and talk a bit about session state and secrets management as well.

The fun part about this is, there’s not even that much code involved. Streamlit handles most of the conversational aspects and you’re primarily responsible for saving history.

Comments closed