Python – Page 11 – Curated SQL

From Anaconda to Standard Python

Published 2024-08-29 by Kevin Feasel

While Anaconda provides a comprehensive package management system, particularly useful for data science, many developers prefer the flexibility and lightweight nature of standard Python environments. This guide will help you make the switch without losing your carefully curated package setup.

Read on for Rob’s solution. I used to be a huge proponent of the Anaconda distribution of Python, but have found myself being less of one, especially with the licensing changes a few years back. If you were already using pip for most package installation, and if you’re fairly consistent about using virtual environments, this transition is even easier than in Rob’s scenario.

Comments closed

Tips for Bringing a Streamlit App into Production

Published 2024-08-28 by Kevin Feasel

I have wrapped up another series:

In this video, I discuss some of the things you should consider as you transition a Streamlit application from development to production. We will cover four methods of bringing a Streamlit app to production and some thoughts on performance optimization.

This one doesn’t have much in the way of demos, but I do spend a lot of time at the virtual whiteboard, so it’s got that going for it.

Comments closed

mssparkutils now notebookutils and Validating DAGs in Fabric

Published 2024-08-26 by Kevin Feasel

Sandeep Pawar gives us two quick hits:

First, if you haven’t noticed mssparkutils has been officially renamed to notebookutils. Check out the official documentation for details. Be sure to use/update your notebooks to notebookutils.

Read on for a pair of notes around this name change, as well as some capabilities to validate DAGs when using runMultiple to orchestrate multiple notebook executions.

Comments closed

Databricks Notebook Package Installation and Variables

Published 2024-08-26 by Kevin Feasel

Chen Hirsh diagnoses a problem:

A friend called to ask for my help with a weird issue. In a Databricks notebook using Python, he declares and assigns a variable in the first cell. Something like that:
my_var = 1
He then runs the rest of the notebook, and somewhere along the way, tries to use this variable, and gets this message:
NameError: name 'my_var' is not defined
Going back to cell 1, and checking the value of my_var, he gets the same error.

Read on for the root cause of the issue, as well as a pair of helpful tips from Chen.

Comments closed

Interpreting Linear Regression Model Coefficients

Published 2024-08-23 by Kevin Feasel

Vinod Chugani looks at a linear regression:

Linear regression models are foundational in machine learning. Merely fitting a straight line and reading the coefficient tells a lot. But how do we extract and interpret the coefficients from these models to understand their impact on predicted outcomes? This post will demonstrate how one can interpret coefficients by exploring various scenarios. We’ll delve into the analysis of a single numerical feature, investigate the role of categorical variables, and unpack the complexities introduced when these features are combined. Through this exploration, we aim to equip you with the skills needed to leverage linear regression models effectively, enhancing your analytical capabilities across different data-driven domains.

Click through for details, with examples in Python.

Comments closed

Speech to Text with Streamlit and Azure AI

Published 2024-08-21 by Kevin Feasel

I have a new video:

In this video, I show how we can integrate with the Azure AI Services Speech service, using two different methods to capture speech from the microphone via our Streamlit application and submit that to Azure OpenAI.

Check out the video and final set of code. There’s an intermediate set of code for detecting a single utterance. But I think the final product works out pretty well.

Comments closed

A Primer on One-Hot Encoding

Published 2024-08-19 by Kevin Feasel

Vinod Chugani does a bit of data modeling:

Preparing categorical data correctly is a fundamental step in machine learning, particularly when using linear models. One Hot Encoding stands out as a key technique, enabling the transformation of categorical variables into a machine-understandable format. This post tells you why you cannot use a categorical variable directly and demonstrates the use One Hot Encoding in our search for identifying the most predictive categorical features for linear regression.

Read the whole thing.

Comments closed

Using AI Skills as Cell Magics in Microsoft Fabric Notebooks

Published 2024-08-09 by Kevin Feasel

Sandeep Pawar takes a look at a new preview capability:

The public preview of AI Skills in Microsoft Fabric was announced yesterday. AI Skills allows Fabric developers to create their own GenAI experience using data in the lakehouse. Unlike Copilot, which is an AI assistant, AI Skills lets users build a validated Q&A application that queries lakehouse data by converting natural language questions into T-SQL queries. It’s only available in paid F64+ SKUs. You can watch the below video for Copilot, AI Skills and Gen AI experiences in Fabric:

Read on for more details on how it works.

Comments closed

Tips for Hyperparameter Tuning

Published 2024-08-07 by Kevin Feasel

Bala Priya C shares some tips and techniques:

If you’re familiar with machine learning, you know that the training process allows the model to learn the optimal values for the parameters—or model coefficients—that characterize it. But machine learning models also have a set of hyperparameters whose values you should specify when training the model. So how do you find the optimal values for these hyperparameters?

You can use hyperparameter tuning to find the best values for the hyperparameters. By systematically adjusting hyperparameters, you can optimize your models to achieve the best possible results.

This tutorial provides practical tips for effective hyperparameter tuning—starting from building a baseline model to using advanced techniques like Bayesian optimization. Whether you’re new to hyperparameter tuning or looking to refine your approach, these tips will help you build better machine learning models. Let’s get started.

Read on for those techniques. Incidentally, one of my “Old man yells at clouds” takes is that I dislike the existence of hyperparameters and consider them a modeling failure, essentially telling the implementer to do part of the researcher’s work. Knowing that they are necessary to work with for so many algorithms, there’s nothing to do but learn how to work with them effectively, but there’s a feel of outsourcing the hard work to users that I don’t like about the process. For that reason, I have extra respect for algorithms that neither need nor offer hyperparameters.

Comments closed

Chat with Your Own Data in Streamlit and Azure Open AI

Published 2024-08-07 by Kevin Feasel

I have a new video:

In this video, I show how we can make a GPT-4 deployment aware of our own custom data, without needing to fine-tune the model. I talk about meta prompts and the Retrieval Augmented Generation (RAG) pattern, and then show how you can set this up using Azure AI Search and Azure OpenAI. Then, I bring it back to Streamlit and give users the option between chatting with a generic GPT-4 deployment and chatting over custom data.

I try to make my videos 10 minutes in length. They usually end up at 15-18 minutes. This one clocks in at more than 30 minutes and there’s very little fluff.

Comments closed

Category: Python