Press "Enter" to skip to content

Category: Python

Parameterization and Mocking in Python Tests

Aida Gjoka and Russ Hyde show off some capabilities in the pytest library:

Writing tests is one of the best ways to keep your code reliable and reproducible. This post builds on our previous blog about Python testing with pytest Part 1, and explores some of the more advanced features it offers. From parametrised fixtures to mocking and other useful pytest plugins, we will show how to make your tests more reproducible, easier to manage and demonstrate how writing simple tests can save you time in the long run.

Click through to learn more. I’m a huge fan of parameterization in pytest—it’s really easy to do. Mocks are a bit harder to pull off in practice, though quite useful.

Leave a Comment

Checking Key Vault Access in Microsoft Fabric Spark Notebooks

Marc Lelijveld has clearance:

Working with sensitive data in Microsoft Fabric requires careful handling of secrets, especially when collaborating externally. In a recent customer engagement, I needed to validate access to Azure Key Vault from within a Fabric Notebook, without ever exposing the actual secret values. With only read access granted and no need to manage or update secrets, I focused on confirming that the connection was working as expected.

In this blog, I’ll walk you through the approach, including the setup, code snippets, and logic behind this quick but crucial verification step.

Click through for the full story.

Leave a Comment

Caching Database Calls in Python with Redis

Levi Masonde stands up a Redis instance:

Databases play a vital role in software applications—they need to keep updated data or state, which is served by the database acting as the source of truth for the application. How the database performs affects how the entire application performs. Besides obvious factors that affect the performance of a database (hardware, database type, networking infrastructure), there are techniques designed to help you improve the performance of your database and ultimately, your applications. One way to do this is to add caching to your database. But which cache technique works best for which application requirements? This article sheds light on one strategy to implement a cache system.

Click through for one pattern of interaction between cache and database. My preference with the cache-aside pattern is to hide the two data platforms from the calling application as much as possible. In a classic object-oriented language like C#, the actual database + cache calls would be in a separate project and would expose methods on classes that were database-agnostic. With Python, I’d use different .py files in the same project unless I wanted to build a wheel file and deploy it to multiple projects, but the concept would still be the same. I’m not the biggest fan of the way that Levi did it, forcing API developers to have knowledge of both the cache and the database, as that increases the risk of a future developer messing something up.

Leave a Comment

Writing to Microsoft FabricDelta Tables in Python via DuckDB

Gilbert Quevauvilliers does a bit of writing:

When I was exploring how to easily write to Delta Tables with a Python notebook, it took me a considerable amount of time to find out how to do this.

This is my learnings below, and from my point of view it makes it easy to write to a Lakehouse table, like what is done with a PySpark notebook.

Click through for one very important note, as well as the process.

Comments closed

Model Diagnostics for Statistics vs Machine Learning

Christian Lorentzen talks diagnostics:

In this post, we show how different use cases require different model diagnostics. In short, we compare (statistical) inference and prediction.

As an example, we use a simple linear model for the Munich rent index dataset, which was kindly provided by the authors of Regression – Models, Methods and Applications 2nd ed. (2021). This dataset contains monthy rents in EUR (rent) for about 3000 apartments in Munich, Germany, from 1999.

Read on to learn more about this dataset and how the mindset differs if you’re thinking about inference versus prediction.

Comments closed

Reading Delta Tables via SQL Code in a Microsoft Fabric Python Notebook

Gilbert Quevauvilliers writes a SQL statement:

I come from a TSQL background, so using SQL makes it easy for me to work with data.

There are multiple ways to use SQL in a PySpark notebook, and when I started using a Python notebook it was not so straightforward.

In this blog post I will show you how I use SQL Code.

As mentioned previously I am by no means an expert, I typically find a way that works, is fast and doesn’t consume a lot of capacity. If that works consistently for me then that is how I go about it.

Click through for the solution, which uses DuckDB. As such, the SQL syntax isn’t T-SQL—it’s more like psql. But it does do a great job of interacting with Parquet files and Delta tables.

Comments closed

Comprehensions in Python

I have a new video:

In this video, I show how to use comprehensions in Python to generate lists, dictionaries, and sets. I also run a quick performance test, comparing a list comprehension to an equivalent for loop.

It can take a little bit of time to get used to the syntax, but once you do, comprehensions are quite powerful.

Comments closed

When to Use a Python Notebook vs Spark Notebook in Microsoft Fabric

Gilbert Quevauvilliers lays out the plan:

This is the first blog post in a series of blog posts where I dive into how to use Python notebooks instead of Spark notebooks. For example, I will show you how to run a SQL query from a Lakehouse table and get it into a data frame. Read and write to a Lakehouse table and more.

NOTE: This is still in preview, but I personally think that this is worth investing time in learning.

The reason I am using the term Python is because the notebook can ONLY use Python and not any of the other languages available in a Spark

Also, in fairness, I’ve heard people working on Microsoft Fabric within the company reference these as ‘Python notebooks,’ so Gilbert is in good company.

Comments closed

The Monty Hall Problem

I have a new video:

In this video, I explain the classic Monty Hall problem, based on the concept of the show Let’s Make a Deal. I explain the paradox behind the problem and demonstrate that it’s better to switch doors.

I’m not joking at all when I say it took me years of listening to explanations before it actually clicked. Some of it is my innate stubbornness, but I think this is a great example of a true paradox, where the intuitive answer is wrong and first-level reasoning also leads you astray.

Comments closed