Press "Enter" to skip to content

Category: Python

Running SemPy from Microsoft Fabric Notebooks

Gilbert Quevauvilliers sets up an environment:

Below is where I had an error when trying to run a notebook via a data pipeline and it failed.

Below are the steps to get this working.

This was the error message I got as shown below.

Notebook execution failed at Notebook service with http status code – ‘200’, please check the Run logs on Notebook, additional details – ‘Error name – MagicUsageError, Error value – %pip magic command is disabled.’ :

Read on to see how you can fix this error and get SemPy running.

Leave a Comment

A Primer on Vector Similarity Search

Pavan Belagatti talks vectors:

In the realm of generative AI, vectors play a crucial role as a means of representing and manipulating complex data. Within this context, vectors are often high-dimensional arrays of numbers that encode significant amounts of information. For instance, in the case of image generation, each image can be converted into a vector representing its pixel values or more abstract features extracted through deep learning models.

These vectors become the language through which AI algorithms understand and generate new content. By navigating and modifying these vectors in a multidimensional space, generative AI produces new, synthetic instances of data — whether images, sounds or text — that mimic the characteristics of the original dataset. This vector manipulation is at the heart of AI’s ability to learn from data and generate realistic outputs based on that learning.

Read on for a high-level overview of the topic.

Leave a Comment

Removing Skew in Data with Python

Vinod Chugani kicks the lop-sided distribution to straighten it out:

Data transformations enable data scientists to refine, normalize, and standardize raw data into a format ripe for analysis. These transformations are not merely procedural steps; they are essential in mitigating biases, handling skewed distributions, and enhancing the robustness of statistical models. This post will primarily focus on how to address skewed data. By focusing on the ‘SalePrice’ and ‘YearBuilt’ attributes from the Ames housing dataset, we will provide examples of positive and negative skewed data and illustrate ways to normalize their distributions using transformations.

Read on to see what kinds of transformations are available.

Leave a Comment

A Primer on Pandas Queries

Vinod Chugani works with Pandas:

In the realm of data analysis, SQL stands as a mighty tool, renowned for its robust capabilities in managing and querying databases. However, Python’s pandas library brings SQL-like functionalities to the fingertips of analysts and data scientists, enabling sophisticated data manipulation and analysis without the need for a traditional SQL database. This exploration delves into applying SQL-like functions within Python to dissect and understand data, using the Ames Housing dataset as your canvas. The Ames Housing dataset, a comprehensive compilation of residential property sales in Ames, Iowa, from 2006 to 2010, serves as an ideal dataset for this exploration, offering a rich variety of features to analyze and derive insights from.

Click through for examples of how to use the query() function in conjunction with other Pandas functionality to answer questions of the data.

Leave a Comment

Visualizing Genomics Data with Streamlit in Snowflake

Rebecca O’Connor builds an app:

The blog describes how this data is complemented with the following two additional data sets:

  • An Annotation dataset
  • A Panel Dataset

Simple SQL queries is then used gain answers to a multitude of questions held within the vast amount of data.

I utilised the same datasets in order to create a streamlit app.

Click through for the code. This is the reason why I like Streamlit so much: you can build an interactive data-centric application very easily. Granted, you can abuse Streamlit pretty hard, but it is powerful.

Leave a Comment

Renaming Multiple Columns in a PySpark Notebook

Gilbert Quevauvilliers wants one rename to rule them all:

Following on from my previous blog post this blog post I’m going to demonstrate how to bulk rename column names in a single step instead of having to rename them individually.

The reason this came about is because I had a set of data where the column names had the square brackets which I wanted to remove.

As shown below I have highlighted 2 column names with the square brackets.

Read on to see how you can perform somewhat-generic rename operations in Spark notebooks.

Leave a Comment

Generators in Python

Jack Wallen generates some values:

Within the realm of programming, a generator is a routine that is used to control interaction within a loop. Generators are useful when you want to produce a large sequence of values without storing them in memory at once. The ability to create a large sequence without using up memory is important, especially when dealing with Python programs that generate a large amount of information, such as a long sequence of numbers.

Read on to see how generators work and how to build a custom generator.

Leave a Comment

Classical Methods for Outlier Detection

Vinod Chugani is speaking my language:

Outliers are unique in that they often don’t play by the rules. These data points, which significantly differ from the rest, can skew your analyses and make your predictive models less accurate. Although detecting outliers is critical, there is no universally agreed-upon method for doing so. While some advanced techniques like machine learning offer solutions, in this post, we will focus on the foundational Data Science methods that have been in use for decades.

Vinod looks at a few techniques, including inter-quartile range and comparing results to an expected distribution. If you’re really excited about this topic, I know a guy who’s written a bit about it.

Leave a Comment

Looping in Python

Jack Wallen has us spinning in circles:

What is a loop? Other than a fruity breakfast cereal, a loop is a portion of code that repeats a set number of times until a desired process is complete.

Here’s an example of a loop that many can relate to. You go to the ATM, insert your card, and start with your transaction. First, you check your balance, then you remove money from your account. The loop keeps running with each transaction until you tell the ATM you are finished.

Read on for loop examples, though for is definitely much more common and popular than while. I’d also recommend learning a bit about list comprehensions, which are a way of turning a for loop into a one-line operation: [f(x) for x in my_list] is a simple example: for each row in my_list, perform function f() with that row’s value as the input. Then, take all of the results of f(x) and stick them together in a new list.

Comments closed

Renaming a Column in Microsoft Fabric via Python Notebook

Gilbert Quevauvilliers performs a rename:

I thought it would be good to help others in terms of my learning journey when working with partner notebooks and Microsoft fabric.

In today’s blog post, I am going to show you how to rename a column. In my experience this came up because I had a column name which had a forward slash “/” in it which caused the loading of the data for the table to fail because this is a reserved character.

Read on for the code an example of how it works in action.

Comments closed