Press "Enter" to skip to content

Category: Python

Data Validation with Great Expectations and Azure Functions

Eduard van Valkenburg does a bit of data validation:

Great Expectations (Great Expectations Home Page • Great Expectations) is a popular Python-based OSS tool to validate data that comes into your data estate. And for me, validating incoming data is best done file by file, as the files arrive! On Azure there is no better platform for that then Azure Functions. I love the serverless nature of Functions and with the triggers available for arriving blobs, as well as HTTP, event grid events, queues and others. There are some great patterns that allow you to build event-driven data architectures. We also now have the Python v2 framework for Azure Functions available, which makes the developer experience even better. So, let’s go through how to get it running.

This looks really interesting and tying it in to Azure Functions is a good idea assuming that the checks don’t run for too long.

Comments closed

Working with Kafka from Python

Dave Shook has a new course for us:

If you’re a Python developer, our free Apache Kafka for Python Developers course will show you how to harness the power of Kafka in your applications. You will learn how to build Kafka producer and consumer applications, how to work with event schemas and take advantage of Confluent Schema Registry, and more. Follow along in each module as Dave Klein, Senior Developer Advocate at Confluent, covers all of these topics in detail. Hands-on exercises occur throughout the course to solidify concepts as they are presented. At its end, you will have the knowledge you need to begin developing Python applications that stream data to and from Kafka clusters.

Read on to learn more about it and give it a try.

Comments closed

Visualizing PyTorch Models

Adrian Tam describes a model:

PyTorch is a deep learning library. You can build very sophisticated deep learning models with PyTorch. However, there are times you want to have a graphical representation of your model architecture. In this post, you will learn:

  • How to save your PyTorch model in an exchange format
  • How to use Netron to create a graphical representation.

Click through for the article, which is mostly about training the PyTorch model. Visualizing it turns out to be pretty easy with the right tool.

Comments closed

Brief Code File Analysis with Python

Matt Eland reviews the code:

Last year I devised some ways of analyzing the history and structure of code in a visual way, but I didn’t document much of that to share with the community. This article explores the process I used to build a CSV file containing the path, extension, project, and line of code count in all source files in a project.

Click through for the Python code and an explanation of what it’s doing.

Comments closed

Estimating Quantiles in Python

Christian Lorentzen digs into quantile calculation:

Applied statistics is dominated by the ubiquitous mean. For a change, this post is dedicated to quantiles. I will give my best to provide a good mix of theory and practical examples.

While the mean describes only the central tendency of a distribution or random sample, quantiles are able to describe the whole distribution. They appear in box-plots, in childrens’ weight-for-age curves, in salary survey results, in risk measures like the value-at-risk in the EU-wide solvency II framework for insurance companies, in quality control and in many more fields.

There are easy functions to calculate quantiles in R and Python; this post serves as a way of understanding the variety of quantile functions available and how they can affect results with small sample sizes.

Comments closed

Building Custom Lineage in Purview

Alex Crampton writes some Python code:

The aim of this blog is to explain how to create custom Purview processes, enabling you to add lineage from processes that are not tracked out of the box.

As covered in this blogAzure Purview can help with understanding the lineage of your data, offering visibility of how and where data is moving within your data estate.

Lineage can only be tracked out of the box when using tools such as Data Factory, Power BI, and Azure Data Share. Lineage is lost when using other tools like Azure Functions, Databricks notebooks, or SQL stored procedures.

Read on to see the code, as well as what you can do.

Comments closed

Automated Data Visualization in Python

Brendan Tierney saves some time:

Creating data visualizations in Python can be a challenge. For some it an be easy, but for most (and particularly new people to the language) they always have to search for the commands in the documentation or using some search engine. Over the past few years we have seem more and more libraries coming available to assist with many of the routine and tedious steps in most data science and machine learning projects. I’ve written previously about some data profiling libraries in Python. These are good up to a point, but additional work/code is needed to explore the data to suit your needs. One of these Python libraries, designed to make your initial work on a new data set easier is called AutoViz. It’s good to see there is continued development work on this library, which can be really help for creating initial sets of charts for all the variables in your data set, plus it has some additional features which help to make it very useful and cuts down on some of the additional code you might need to write.

This looks like it’s worth a try and could serve well as a first-glance approach to exploratory data analysis.

Comments closed

Multi-Class Classification in PyTorch

Adrian Tam does some iris categorizing:

Now you need to have a model that can take the input and predict the output, ideally in the form of one-hot vectors. There is no science behind the design of a perfect neural network model. But you know one thing, it has to take in a vector of 4 features and output a vector of 3 values. The 4 features corresponds to what you have in the dataset. The 3-value output is because we know the one-hot vector has 3 elements. Anything can be in between, and those are known as the “hidden layers” since they are neither input nor output.

Click through for the full tutorial.

Comments closed

Building a Shiny App in R and Python

Nicola Rennie does a language throw-down:

Shiny is an R package that makes it easier to build interactive web apps straight from R. Back in July 2022 at rstudio::conf(2022), Posit (formerly RStudio) announced the release of Shiny for Python. As someone who knows Python but hasn’t written any Python code for quite a long time, I wanted to see how the two compared. So I did the only logical thing and built a Shiny app – twice!

After building (almost) identical Shiny apps, with one built solely in R and the other solely in Python, I’ve written this blog post to take you through some of the things that are the same, and a few things that are slightly different.

Note: at the time of writing Shiny for Python is still in alpha, so if you’re reading this blog quite a while after it was first published, some things may have changed.

The code, as you’d expect, looks quite similar. I also learned about plotnine, something I’ll need to keep in mind. H/T R-Bloggers.

Comments closed

Plotly Visualizations in Azure Data Explorer

Adi Eldar improves ADX visualization:

Azure Data Explorer (ADX) supports various types of data visualizations including time, bar and scatter charts, maps, funnels and many more. The chosen visualization can be specified as part of the KQL query using ‘render’ operator, or interactively selected when building ADX dashboards. Today we extend the set of visualizations, supporting advanced interactive visualizations by Plotly graphics library. Plotly supports ~80 chart types including basic charts, scientific, statistical, financial, maps, 3D, animations and more. There are two methods for creating Plotly visuals:

Read on to learn more about those two methods.

Comments closed