Press "Enter" to skip to content

Category: Python

Generators in Python

Jack Wallen generates some values:

Within the realm of programming, a generator is a routine that is used to control interaction within a loop. Generators are useful when you want to produce a large sequence of values without storing them in memory at once. The ability to create a large sequence without using up memory is important, especially when dealing with Python programs that generate a large amount of information, such as a long sequence of numbers.

Read on to see how generators work and how to build a custom generator.

Comments closed

Classical Methods for Outlier Detection

Vinod Chugani is speaking my language:

Outliers are unique in that they often don’t play by the rules. These data points, which significantly differ from the rest, can skew your analyses and make your predictive models less accurate. Although detecting outliers is critical, there is no universally agreed-upon method for doing so. While some advanced techniques like machine learning offer solutions, in this post, we will focus on the foundational Data Science methods that have been in use for decades.

Vinod looks at a few techniques, including inter-quartile range and comparing results to an expected distribution. If you’re really excited about this topic, I know a guy who’s written a bit about it.

Comments closed

Looping in Python

Jack Wallen has us spinning in circles:

What is a loop? Other than a fruity breakfast cereal, a loop is a portion of code that repeats a set number of times until a desired process is complete.

Here’s an example of a loop that many can relate to. You go to the ATM, insert your card, and start with your transaction. First, you check your balance, then you remove money from your account. The loop keeps running with each transaction until you tell the ATM you are finished.

Read on for loop examples, though for is definitely much more common and popular than while. I’d also recommend learning a bit about list comprehensions, which are a way of turning a for loop into a one-line operation: [f(x) for x in my_list] is a simple example: for each row in my_list, perform function f() with that row’s value as the input. Then, take all of the results of f(x) and stick them together in a new list.

Comments closed

Renaming a Column in Microsoft Fabric via Python Notebook

Gilbert Quevauvilliers performs a rename:

I thought it would be good to help others in terms of my learning journey when working with partner notebooks and Microsoft fabric.

In today’s blog post, I am going to show you how to rename a column. In my experience this came up because I had a column name which had a forward slash “/” in it which caused the loading of the data for the table to fail because this is a reserved character.

Read on for the code an example of how it works in action.

Comments closed

Viewing DAX in Microsoft Fabric with SemPy

Kevin Chant talks about a recent issue:

Recently I have been helping others get up to speed with Microsoft Fabric. Which includes going through some Power BI topics.

One issue that came up was how to show them the DAX used for a measure within a Power BI report that had been published to Microsoft Fabric. To link working with measures in Power BI Desktop with working in Microsoft Fabric.

Kevin shows the normal way of doing this, as well as an alternative using the SemPy library.

Comments closed

Exploring the Area under the ROC Curve

Aayush Srivastava takes us through one of the classics of classification:

In the realm of machine learning classification, model evaluation is an essential step to assess the performance and effectiveness of various algorithms. One widely-used tool for this purpose is the Area Under the Receiver Operating Characteristic Curve (AUC-ROC curve). In this blog, we will delve into the significance of the AUC-ROC curve, how it is calculated, and why it is an invaluable metric for evaluating classification models.

In this article, we will discuss the performance metrics used in the classification and also explore the implications of using two, namely AUC and ROC. Here is an overview of the important points that we will discuss in the article. 

The fun anecdote around ROC curves is that their name actually makes sense if you know the origin: it came out of the British army in World War II, where they tracked how their radar operators classified blips as German aircraft or noise (e.g., flocks of birds). The radar receiver operators had certain characteristics, where some were more effective at separating actual threats from noise, hence the Receiver Operating Characteristic curve.

Comments closed

Parallelizing Notebook Runs in Microsoft Fabric via Python

Sandeep Pawar kicks off multiple notebooks at once:

The notebook class in mssparkutils has two methods to run notebooks – run and runMultiple . run allows you to trigger a notebook run for one single notebook. Mim wrote a nice blog to show how to use it and its usefulness.

runMultiple , on the other hand, allows you to create a Direct Acyclic Graph (DAG) of notebooks to execute notebooks in parallel and in specified order, similar to a pipeline run except in a notebook.

Read on to learn more about the advantages of this latter approach as well as how you can do it.

Comments closed

Trying out Data Wrangler

Ginger Grant tries out a feature in Microsoft Fabric:

The second element in my series on new Fabric Features is Data Wrangler. Data Wrangler is an entirely new feature found inside of the Data Engineering and Machine Learning Experience of Fabric. It was created to help analyze data in a lakehouse using Spark and generated code. You may find that there’s a lot of data in the data lake that you need to evaluate to determine how you might incorporate the data into a data model. It’s important to examine the data to evaluate what the data contains. Is there anything missing? Incorrectly data typed? Bad Data? There is an easy method to discover what is missing with your data which uses some techniques commonly used by data scientists. Data Wrangler is used inside of notebooks in the Data Engineering or Machine Learning Environments, as the functionality does not exist within the Power BI experience.

Click through to see how it works. I liken it to Power Query for people who don’t like Python.

Comments closed