Press "Enter" to skip to content

Category: Python

Chat with Your Own Data in Streamlit and Azure Open AI

I have a new video:

In this video, I show how we can make a GPT-4 deployment aware of our own custom data, without needing to fine-tune the model. I talk about meta prompts and the Retrieval Augmented Generation (RAG) pattern, and then show how you can set this up using Azure AI Search and Azure OpenAI. Then, I bring it back to Streamlit and give users the option between chatting with a generic GPT-4 deployment and chatting over custom data.

I try to make my videos 10 minutes in length. They usually end up at 15-18 minutes. This one clocks in at more than 30 minutes and there’s very little fluff.

Comments closed

Defining a OneLake Filesystem using fsspec

Sandeep Pawar looks at fsspec:

I mentioned on X the other day that, like other filesystem backends such as S3 and GCS, you can use fsspec to define the OneLake filesystem too. In this blog, I will explain how to define it and why it’s important to know about it.

Click through for the details on what fsspec is, why it’s important, and what benefits you can get in Microsoft Fabric as a result of its support of fsspec.

Comments closed

Chat with Azure OpenAI in Streamlit

I have a new video:

In this video, I show how we can integrate an Azure OpenAI GPT-4 model into our Streamlit dashboard. Along the way, I also show off how easy it is to create multiple pages and talk a bit about session state and secrets management as well.

The fun part about this is, there’s not even that much code involved. Streamlit handles most of the conversational aspects and you’re primarily responsible for saving history.

Comments closed

AutoML in Python with TPOT

Abid Ali Awan gives us a primer on TPOT:

AutoML is a tool designed for both technical and non-technical experts. It simplifies the process of training machine learning models. All you have to do is provide it with the dataset, and in return, it will provide you with the best-performing model for your use case. You don’t have to code for long hours or experiment with various techniques; it will do everything on its own for you.

In this tutorial, we will learn about AutoML and TPOT, a Python AutoML tool for building machine learning pipelines. We will also learn to build a machine learning classifier, save the model, and use it for model inference.

Click through to see an example of how to use the library.

Comments closed

FabricRestClient and Long-Running Operations

Sandeep Pawar has a public service announcement:

I want to thank Michael Kovalsky for pointing out that FabricRestClient in Semantic Link supports (since v 0.7.5) Long Running Operation (LRO).

LRO support allows the client to wait for the request to process without being blocked. Without LRO support, you will get a 202 response code saying the request is being processed. You need to submit another request based on the url returned to get the result. With LRO support, FabricRestClient will wait 20s and give you the result back.

Click through to see what you’d need to do to enable it, as well as the benefit you can receive.

Comments closed

Defining the Default Lakehouse for a Fabric Notebook

Sandeep Pawar sets up a default lakehouse:

I wrote a blog post a while ago on mounting a lakehouse (or generally speaking a storage location) to all nodes in a Fabric spark notebook. This allows you to use the File API file path from the mounted lakehouse.

Mounting a lakehouse using mssparkutils.fs.mount() doesn’t define the default lakehouse of a notebook. To do so, you can use the configure magic as below:

Read on for that command, as well as some notes around using it.

Comments closed

Forms and Filters in Streamlit

I have a new video:

In this video, I extend the Streamlit app that we’ve been working on even more. We’ll convert a set of drop-down lists into a form, change the behavior of these drop-down lists, and add date picker logic.

Click through for the video, the code to date, and links to additional resources. I’m pretty happy so far with this series, and we’re about to kick it up to another level with the next video.

Comments closed

Calculating the Size of Dataflow Gen2 Staging Lakehouses

Sandeep Pawar busts out the calculator:

My friend Alex Powers (PM, Fabric CAT) wrote a blog post about cleaning the staging lakehouses generated by Dataflow Gen2. Before reading this blog, go ahead and read his blog first on the mechanics of it and the whys. Note that these are system generated lakehouses so at some time in the future, they will be automatically purged but until then the users will be paying the storage cost of these lakehouses. If you want to read more about how dataflow gen2 works and whether you should stage or not , read this and this blog.

Read on for a Python script using the SemPy library.

Comments closed

Polymorphism in Python

Rajendrra Gupta talks object-orientation:

Polymorphism is a popular term in object-oriented programming (OOP) languages. An object can take multiple forms in different ways in polymorphism. For example, a woman takes different roles in her daily life, such as wife, professional, athlete, mother, and daughter, as the diagram below depicts:

Polymorphism isn’t a particularly difficult topic to understand, though because of the way that different languages implement the idea in subtly different ways, it’s good to know what you’re able to do in your language of choice.

Comments closed