Press "Enter" to skip to content

Category: Python

Caching Database Calls in Python with Redis

Levi Masonde stands up a Redis instance:

Databases play a vital role in software applications—they need to keep updated data or state, which is served by the database acting as the source of truth for the application. How the database performs affects how the entire application performs. Besides obvious factors that affect the performance of a database (hardware, database type, networking infrastructure), there are techniques designed to help you improve the performance of your database and ultimately, your applications. One way to do this is to add caching to your database. But which cache technique works best for which application requirements? This article sheds light on one strategy to implement a cache system.

Click through for one pattern of interaction between cache and database. My preference with the cache-aside pattern is to hide the two data platforms from the calling application as much as possible. In a classic object-oriented language like C#, the actual database + cache calls would be in a separate project and would expose methods on classes that were database-agnostic. With Python, I’d use different .py files in the same project unless I wanted to build a wheel file and deploy it to multiple projects, but the concept would still be the same. I’m not the biggest fan of the way that Levi did it, forcing API developers to have knowledge of both the cache and the database, as that increases the risk of a future developer messing something up.

Leave a Comment

Writing to Microsoft FabricDelta Tables in Python via DuckDB

Gilbert Quevauvilliers does a bit of writing:

When I was exploring how to easily write to Delta Tables with a Python notebook, it took me a considerable amount of time to find out how to do this.

This is my learnings below, and from my point of view it makes it easy to write to a Lakehouse table, like what is done with a PySpark notebook.

Click through for one very important note, as well as the process.

Leave a Comment

Model Diagnostics for Statistics vs Machine Learning

Christian Lorentzen talks diagnostics:

In this post, we show how different use cases require different model diagnostics. In short, we compare (statistical) inference and prediction.

As an example, we use a simple linear model for the Munich rent index dataset, which was kindly provided by the authors of Regression – Models, Methods and Applications 2nd ed. (2021). This dataset contains monthy rents in EUR (rent) for about 3000 apartments in Munich, Germany, from 1999.

Read on to learn more about this dataset and how the mindset differs if you’re thinking about inference versus prediction.

Leave a Comment

Reading Delta Tables via SQL Code in a Microsoft Fabric Python Notebook

Gilbert Quevauvilliers writes a SQL statement:

I come from a TSQL background, so using SQL makes it easy for me to work with data.

There are multiple ways to use SQL in a PySpark notebook, and when I started using a Python notebook it was not so straightforward.

In this blog post I will show you how I use SQL Code.

As mentioned previously I am by no means an expert, I typically find a way that works, is fast and doesn’t consume a lot of capacity. If that works consistently for me then that is how I go about it.

Click through for the solution, which uses DuckDB. As such, the SQL syntax isn’t T-SQL—it’s more like psql. But it does do a great job of interacting with Parquet files and Delta tables.

Leave a Comment

Comprehensions in Python

I have a new video:

In this video, I show how to use comprehensions in Python to generate lists, dictionaries, and sets. I also run a quick performance test, comparing a list comprehension to an equivalent for loop.

It can take a little bit of time to get used to the syntax, but once you do, comprehensions are quite powerful.

Leave a Comment

When to Use a Python Notebook vs Spark Notebook in Microsoft Fabric

Gilbert Quevauvilliers lays out the plan:

This is the first blog post in a series of blog posts where I dive into how to use Python notebooks instead of Spark notebooks. For example, I will show you how to run a SQL query from a Lakehouse table and get it into a data frame. Read and write to a Lakehouse table and more.

NOTE: This is still in preview, but I personally think that this is worth investing time in learning.

The reason I am using the term Python is because the notebook can ONLY use Python and not any of the other languages available in a Spark

Also, in fairness, I’ve heard people working on Microsoft Fabric within the company reference these as ‘Python notebooks,’ so Gilbert is in good company.

Leave a Comment

The Monty Hall Problem

I have a new video:

In this video, I explain the classic Monty Hall problem, based on the concept of the show Let’s Make a Deal. I explain the paradox behind the problem and demonstrate that it’s better to switch doors.

I’m not joking at all when I say it took me years of listening to explanations before it actually clicked. Some of it is my innate stubbornness, but I think this is a great example of a true paradox, where the intuitive answer is wrong and first-level reasoning also leads you astray.

Comments closed

Data Conversion via Generative AI

Grant Fritchey rearranges some data:

The DM-32 is a Digital Mobile Radio (DMR) as well as an analog radio. You can follow the link to understand all that DMR represents when talking radios. I want to focus on the fact that you have to program the behaviors into a DMR radio. While the end result is identical for every DMR radio, how you get there, the programming software, is radically different for every single radio (unless you get a radio that supports open source OpenGD77, yeah, playing radio involves open source as well). Which means, if I have more than one DMR radio (I’m currently at 7, and no, I don’t have a problem, shut up) I have more than one Customer Programming Software (CPS) that is completely different from other CPS formats. Now, I like to set up my radios similarly. After all, the local repeaters, my hotspot, and the Talkgroups I want to use are all common. Since every CPS is different, you can’t just export from one and import to the next. However, I had the idea of using AI for data conversion. Let’s see how that works.

Click through for the scenario as well as Grant’s results. Grant’s results were pretty successful for a data mapping operation, though choice of model and simplicity of the input and output examples are important to generate the Python code.

Comments closed

Loading Data from Pandas into Snowflake

Anil Kumar Moka loads some data:

Loading data into Snowflake is a common need. Using Python and pandas is a common go-to solution for data professionals. Whether you’re pulling data from a relational database, wrangling a CSV file, or prototyping a new pipeline, this combination leverages pandas’ intuitive data manipulation and Snowflake’s cloud-native scalability. But let’s be real—data loading isn’t always a simple task.

Files go missing, connections drop, and type mismatches pop up when you least expect them. That’s why robust error handling isn’t just nice-to-have; it’s essential for anything you’d trust in production. In this guide, we’ll walk through the fundamentals of getting data into Snowflake, explore practical examples with pandas and SQLAlchemy, and equip you with the tools to build a dependable, real-world-ready pipeline. Let’s dive in and make your data loading process as smooth as possible!

Read on for a quick primer around data loading and some of the sanity checking we should be doing along the way.

Comments closed

The Power of Virtual Environments in Python

I have a new video:

In this video, I explain why virtual environments are such an important concept in Python and why you should generally be using them. I also talk about virtual environments versus Docker containers and how these are not mutually exclusive.

It took me a while to understand why virtual environments make sense, and I think part of the difficulty in adapting to this mental model was that I was used to the .NET mechanism for package management: per-project library installation. Sure, there was the Global Assembly Cache (GAC) in .NET Framework and that had similar problems to installing packages in base Python installations, but we didn’t use it that often. Or at least, I’ve sublimated however many hours of pain I fought the GAC to the point that I don’t remember them anymore.

Comments closed