Press "Enter" to skip to content

Category: Python

Estimating Quantiles in Python

Christian Lorentzen digs into quantile calculation:

Applied statistics is dominated by the ubiquitous mean. For a change, this post is dedicated to quantiles. I will give my best to provide a good mix of theory and practical examples.

While the mean describes only the central tendency of a distribution or random sample, quantiles are able to describe the whole distribution. They appear in box-plots, in childrens’ weight-for-age curves, in salary survey results, in risk measures like the value-at-risk in the EU-wide solvency II framework for insurance companies, in quality control and in many more fields.

There are easy functions to calculate quantiles in R and Python; this post serves as a way of understanding the variety of quantile functions available and how they can affect results with small sample sizes.

Comments closed

Building Custom Lineage in Purview

Alex Crampton writes some Python code:

The aim of this blog is to explain how to create custom Purview processes, enabling you to add lineage from processes that are not tracked out of the box.

As covered in this blogAzure Purview can help with understanding the lineage of your data, offering visibility of how and where data is moving within your data estate.

Lineage can only be tracked out of the box when using tools such as Data Factory, Power BI, and Azure Data Share. Lineage is lost when using other tools like Azure Functions, Databricks notebooks, or SQL stored procedures.

Read on to see the code, as well as what you can do.

Comments closed

Automated Data Visualization in Python

Brendan Tierney saves some time:

Creating data visualizations in Python can be a challenge. For some it an be easy, but for most (and particularly new people to the language) they always have to search for the commands in the documentation or using some search engine. Over the past few years we have seem more and more libraries coming available to assist with many of the routine and tedious steps in most data science and machine learning projects. I’ve written previously about some data profiling libraries in Python. These are good up to a point, but additional work/code is needed to explore the data to suit your needs. One of these Python libraries, designed to make your initial work on a new data set easier is called AutoViz. It’s good to see there is continued development work on this library, which can be really help for creating initial sets of charts for all the variables in your data set, plus it has some additional features which help to make it very useful and cuts down on some of the additional code you might need to write.

This looks like it’s worth a try and could serve well as a first-glance approach to exploratory data analysis.

Comments closed

Multi-Class Classification in PyTorch

Adrian Tam does some iris categorizing:

Now you need to have a model that can take the input and predict the output, ideally in the form of one-hot vectors. There is no science behind the design of a perfect neural network model. But you know one thing, it has to take in a vector of 4 features and output a vector of 3 values. The 4 features corresponds to what you have in the dataset. The 3-value output is because we know the one-hot vector has 3 elements. Anything can be in between, and those are known as the “hidden layers” since they are neither input nor output.

Click through for the full tutorial.

Comments closed

Building a Shiny App in R and Python

Nicola Rennie does a language throw-down:

Shiny is an R package that makes it easier to build interactive web apps straight from R. Back in July 2022 at rstudio::conf(2022), Posit (formerly RStudio) announced the release of Shiny for Python. As someone who knows Python but hasn’t written any Python code for quite a long time, I wanted to see how the two compared. So I did the only logical thing and built a Shiny app – twice!

After building (almost) identical Shiny apps, with one built solely in R and the other solely in Python, I’ve written this blog post to take you through some of the things that are the same, and a few things that are slightly different.

Note: at the time of writing Shiny for Python is still in alpha, so if you’re reading this blog quite a while after it was first published, some things may have changed.

The code, as you’d expect, looks quite similar. I also learned about plotnine, something I’ll need to keep in mind. H/T R-Bloggers.

Comments closed

Plotly Visualizations in Azure Data Explorer

Adi Eldar improves ADX visualization:

Azure Data Explorer (ADX) supports various types of data visualizations including time, bar and scatter charts, maps, funnels and many more. The chosen visualization can be specified as part of the KQL query using ‘render’ operator, or interactively selected when building ADX dashboards. Today we extend the set of visualizations, supporting advanced interactive visualizations by Plotly graphics library. Plotly supports ~80 chart types including basic charts, scientific, statistical, financial, maps, 3D, animations and more. There are two methods for creating Plotly visuals:

Read on to learn more about those two methods.

Comments closed

K-Fold Cross-Validation in Python

Shanthababu Pandian gives us a primer on k-fold cross-validation:

In each set (fold) training and the test would be performed precisely once during this entire process. It helps us to avoid overfitting. As we know when a model is trained using all of the data in a single shot and gives the best performance accuracy. Resisting this k-fold cross-validation helps us to build the model as a generalized one.

To achieve this K-Fold Cross Validation, we have to split the data set into three sets, Training, Testing, and Validation, with the challenge of the volume of the data.

Read on for the explanation and an example.

Comments closed

Using the Softmax Classifier in PyTorch

Muhammad Asad Iqbal Khan takes us through one of the classifier options available to PyTorch:

While a logistic regression classifier is used for binary class classification, softmax classifier is a supervised learning algorithm which is mostly used when multiple classes are involved.

Softmax classifier works by assigning a probability distribution to each class. The probability distribution of the class with the highest probability is normalized to 1, and all other probabilities are scaled accordingly.

Read on to learn some of the properties of the Softmax classifier, as well as how you can use this for multi-class classification in PyTorch.

Comments closed

Trying out FLAML

Gavita Regunath provides an overview of FLAML:

FLAML is short for Fast and Lightweight Automated Machine Learning library. It is an open-source Python library created by Microsoft researchers in 2021 for automated machine learning (AutoML). It is designed to be fast, efficient, and user-friendly, making it ideal for a wide range of applications.

Click through to learn more and to give it a spin with a pair of notebooks.

Comments closed