Python – Page 3 – Curated SQL

When to Use a Python Notebook vs Spark Notebook in Microsoft Fabric

Published 2025-04-23 by Kevin Feasel

Gilbert Quevauvilliers lays out the plan:

This is the first blog post in a series of blog posts where I dive into how to use Python notebooks instead of Spark notebooks. For example, I will show you how to run a SQL query from a Lakehouse table and get it into a data frame. Read and write to a Lakehouse table and more.

NOTE: This is still in preview, but I personally think that this is worth investing time in learning.

The reason I am using the term Python is because the notebook can ONLY use Python and not any of the other languages available in a Spark

Also, in fairness, I’ve heard people working on Microsoft Fabric within the company reference these as ‘Python notebooks,’ so Gilbert is in good company.

Comments closed

The Monty Hall Problem

Published 2025-04-16 by Kevin Feasel

I have a new video:

In this video, I explain the classic Monty Hall problem, based on the concept of the show Let’s Make a Deal. I explain the paradox behind the problem and demonstrate that it’s better to switch doors.

I’m not joking at all when I say it took me years of listening to explanations before it actually clicked. Some of it is my innate stubbornness, but I think this is a great example of a true paradox, where the intuitive answer is wrong and first-level reasoning also leads you astray.

Comments closed

Data Conversion via Generative AI

Published 2025-04-15 by Kevin Feasel

Grant Fritchey rearranges some data:

The DM-32 is a Digital Mobile Radio (DMR) as well as an analog radio. You can follow the link to understand all that DMR represents when talking radios. I want to focus on the fact that you have to program the behaviors into a DMR radio. While the end result is identical for every DMR radio, how you get there, the programming software, is radically different for every single radio (unless you get a radio that supports open source OpenGD77, yeah, playing radio involves open source as well). Which means, if I have more than one DMR radio (I’m currently at 7, and no, I don’t have a problem, shut up) I have more than one Customer Programming Software (CPS) that is completely different from other CPS formats. Now, I like to set up my radios similarly. After all, the local repeaters, my hotspot, and the Talkgroups I want to use are all common. Since every CPS is different, you can’t just export from one and import to the next. However, I had the idea of using AI for data conversion. Let’s see how that works.

Click through for the scenario as well as Grant’s results. Grant’s results were pretty successful for a data mapping operation, though choice of model and simplicity of the input and output examples are important to generate the Python code.

Comments closed

Loading Data from Pandas into Snowflake

Published 2025-04-04 by Kevin Feasel

Anil Kumar Moka loads some data:

Loading data into Snowflake is a common need. Using Python and pandas is a common go-to solution for data professionals. Whether you’re pulling data from a relational database, wrangling a CSV file, or prototyping a new pipeline, this combination leverages pandas’ intuitive data manipulation and Snowflake’s cloud-native scalability. But let’s be real—data loading isn’t always a simple task.

Files go missing, connections drop, and type mismatches pop up when you least expect them. That’s why robust error handling isn’t just nice-to-have; it’s essential for anything you’d trust in production. In this guide, we’ll walk through the fundamentals of getting data into Snowflake, explore practical examples with pandas and SQLAlchemy, and equip you with the tools to build a dependable, real-world-ready pipeline. Let’s dive in and make your data loading process as smooth as possible!

Read on for a quick primer around data loading and some of the sanity checking we should be doing along the way.

Comments closed

The Power of Virtual Environments in Python

Published 2025-04-02 by Kevin Feasel

I have a new video:

In this video, I explain why virtual environments are such an important concept in Python and why you should generally be using them. I also talk about virtual environments versus Docker containers and how these are not mutually exclusive.

It took me a while to understand why virtual environments make sense, and I think part of the difficulty in adapting to this mental model was that I was used to the .NET mechanism for package management: per-project library installation. Sure, there was the Global Assembly Cache (GAC) in .NET Framework and that had similar problems to installing packages in base Python installations, but we didn’t use it that often. Or at least, I’ve sublimated however many hours of pain I fought the GAC to the point that I don’t remember them anymore.

Comments closed

Fine-Tuning a DistilBERT Model for Question Answering

Published 2025-04-02 by Kevin Feasel

Muhammad Asad Iqbal Khan builds upon a simple model:

The transformers library provides a clean and well-documented interface for many popular transformer models. Not only it makes the source code easier to read and understand, it also provided a standardize way to interact with the model. You have seen in the previous post how to use a model such as DistilBERT for natural language processing tasks. In this post, you will learn how to fine-tune the model for your own purpose. This expands the use of the model from inference to training. Specifically, you will learn:

How to prepare the dataset for training

How to train a model using a helper library

DistilBERT is a major simplification of BERT, but it comes with the advantage that it’s very easy to train on modest hardware and performance is in the same realm of acceptability as the full BERT model. Switching from DistilBERT to BERT isn’t as easy as just swapping out model classes, though it’s pretty close.

Comments closed

Deploying and Using Custom Python Libraries in Microsoft Fabric

Published 2025-04-02 by Kevin Feasel

Miles Cole picks up from part one:

This is part 2 of my prior post that continues where I left off. I previously showed how you can use Resource folders in either the Notebook or Environment in Microsoft Fabric to do some pretty agile development of Python modules/libraries.

Now, how exactly can you package up your code to distribute and leverage it across multiple Workspaces or Environment items? How could we acomplish something like the below?

Read on for the answer.

Comments closed

Building a Simple Microservice with Azure Functions

Published 2025-03-31 by Kevin Feasel

Temidayo Omoniyi takes us through an example of creating a microservice:

Today’s architecture is serverless intensive, with multiple microservices performing a particular task. Industries are beginning to move away from traditional monolithic applications, which have a single large codebase infrastructure handling everything, to an easier microservice approach.

Click through for a primer on serverless architecture, microservices, and how to create a simple Python app that acts as a microservice.

Comments closed

Converting CSV Files to Parquet Format

Published 2025-03-24 by Kevin Feasel

Michael Mayer does a bit of file conversion:

Conversion from CSV to Parquet in streaming mode? No problem for the two power houses Polars and DuckDB. We can even throw in some data preprocessing steps in-between, like column selection, data filters, or sorts.

This is certainly not the only way to perform the task, though it’s fast and effective.

Comments closed

Time-Saving Features in Scikit-Learn

Published 2025-03-20 by Kevin Feasel

Cornelius Yudha Wijaya describes a half-dozen functions:

For many people studying data science, Scikit-Learn is often the first machine learning library they encounter. It’s because Scikit-Learn offers various APIs that are useful for model development while still being easy for beginners to use.

As helpful as they may be, many features from Scikit-Learn are rarely explored and have untapped potential. This article will explore six lesser-known features that will save you time.

The calibration curve function, in particular, drew my attention, especially as I had written that by hand in the past.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Category: Python