Press "Enter" to skip to content

Category: Python

AutoML in Python with TPOT

Abid Ali Awan gives us a primer on TPOT:

AutoML is a tool designed for both technical and non-technical experts. It simplifies the process of training machine learning models. All you have to do is provide it with the dataset, and in return, it will provide you with the best-performing model for your use case. You don’t have to code for long hours or experiment with various techniques; it will do everything on its own for you.

In this tutorial, we will learn about AutoML and TPOT, a Python AutoML tool for building machine learning pipelines. We will also learn to build a machine learning classifier, save the model, and use it for model inference.

Click through to see an example of how to use the library.

Comments closed

FabricRestClient and Long-Running Operations

Sandeep Pawar has a public service announcement:

I want to thank Michael Kovalsky for pointing out that FabricRestClient in Semantic Link supports (since v 0.7.5) Long Running Operation (LRO).

LRO support allows the client to wait for the request to process without being blocked. Without LRO support, you will get a 202 response code saying the request is being processed. You need to submit another request based on the url returned to get the result. With LRO support, FabricRestClient will wait 20s and give you the result back.

Click through to see what you’d need to do to enable it, as well as the benefit you can receive.

Comments closed

Defining the Default Lakehouse for a Fabric Notebook

Sandeep Pawar sets up a default lakehouse:

I wrote a blog post a while ago on mounting a lakehouse (or generally speaking a storage location) to all nodes in a Fabric spark notebook. This allows you to use the File API file path from the mounted lakehouse.

Mounting a lakehouse using mssparkutils.fs.mount() doesn’t define the default lakehouse of a notebook. To do so, you can use the configure magic as below:

Read on for that command, as well as some notes around using it.

Comments closed

Forms and Filters in Streamlit

I have a new video:

In this video, I extend the Streamlit app that we’ve been working on even more. We’ll convert a set of drop-down lists into a form, change the behavior of these drop-down lists, and add date picker logic.

Click through for the video, the code to date, and links to additional resources. I’m pretty happy so far with this series, and we’re about to kick it up to another level with the next video.

Comments closed

Calculating the Size of Dataflow Gen2 Staging Lakehouses

Sandeep Pawar busts out the calculator:

My friend Alex Powers (PM, Fabric CAT) wrote a blog post about cleaning the staging lakehouses generated by Dataflow Gen2. Before reading this blog, go ahead and read his blog first on the mechanics of it and the whys. Note that these are system generated lakehouses so at some time in the future, they will be automatically purged but until then the users will be paying the storage cost of these lakehouses. If you want to read more about how dataflow gen2 works and whether you should stage or not , read this and this blog.

Read on for a Python script using the SemPy library.

Comments closed

Polymorphism in Python

Rajendrra Gupta talks object-orientation:

Polymorphism is a popular term in object-oriented programming (OOP) languages. An object can take multiple forms in different ways in polymorphism. For example, a woman takes different roles in her daily life, such as wife, professional, athlete, mother, and daughter, as the diagram below depicts:

Polymorphism isn’t a particularly difficult topic to understand, though because of the way that different languages implement the idea in subtly different ways, it’s good to know what you’re able to do in your language of choice.

Comments closed

Transferring Linear Model Coefficients

Nina Zumel performs a swap:

A quick glance through the scikit-learn documentation on linear models, or the CRAN task view on Mixed, Multilevel, and Hierarchical Models in R reveals a number of different procedures for fitting models with linear structure. Each of these procedures meet different needs and constraints, and some of them can be computationally intensive to compute. But in the end, they all have the same underlying structure: outcome is modelled as a linear combination of input features.

But the existence of so many different algorithms, and their associated software, can obscure the fact that just because two models were fit differently, they don’t have to be run differently. The fitting implementation and the deployment implementation can be distinct. In this note, we’ll talk about transferring the coefficients of a linear model to a fresh model, without a full retraining.

I had a similar problem about 18 months ago, though much easier than the one Nina describes, as I did have access to the original data and simply needed to build a linear regression in Python that matched exactly the one they developed in R. Turns out that’s not as easy to do as you might think: the different languages have different default assumptions that make the results similar but not the same, and piecing all of this together took a bit of sleuthing.

Comments closed

An Introduction to Streamlit

I have started a new video series:

In this video, I talk about Streamlit, a great Python library for building data applications quickly. We discuss what data applications are, get an idea of how Streamlit compares to other code-first data visualization techniques, and start building a demo application. I also toss in a lengthy sidebar on Python virtual environments because of how important they are.

Streamlit certainly has its foibles—many of which I’ll cover in the series—but I like it a lot as a simple way of building data applications.

Comments closed

Automate the Power BI Incremental Refresh Policy via Semantic Link Labs

Gilbert Quevauvilliers needs to get rid of some data fast:

The scenario here is that quite often there is a requirement to only keep data from a specific start date, or where it should be keeping data for the last N number of years (which is the first day in January).

Currently in Power BI using the default Incremental refresh settings this is not possible. Typically, you must keep more data than is required.

It is best illustrated by using a working example.

Check out that scenario and how you can use the Semantic Link Labs Python library to resolve it.

Comments closed