Press "Enter" to skip to content

Category: Python

Automated Data Visualization in Python

Brendan Tierney saves some time:

Creating data visualizations in Python can be a challenge. For some it an be easy, but for most (and particularly new people to the language) they always have to search for the commands in the documentation or using some search engine. Over the past few years we have seem more and more libraries coming available to assist with many of the routine and tedious steps in most data science and machine learning projects. I’ve written previously about some data profiling libraries in Python. These are good up to a point, but additional work/code is needed to explore the data to suit your needs. One of these Python libraries, designed to make your initial work on a new data set easier is called AutoViz. It’s good to see there is continued development work on this library, which can be really help for creating initial sets of charts for all the variables in your data set, plus it has some additional features which help to make it very useful and cuts down on some of the additional code you might need to write.

This looks like it’s worth a try and could serve well as a first-glance approach to exploratory data analysis.

Comments closed

Multi-Class Classification in PyTorch

Adrian Tam does some iris categorizing:

Now you need to have a model that can take the input and predict the output, ideally in the form of one-hot vectors. There is no science behind the design of a perfect neural network model. But you know one thing, it has to take in a vector of 4 features and output a vector of 3 values. The 4 features corresponds to what you have in the dataset. The 3-value output is because we know the one-hot vector has 3 elements. Anything can be in between, and those are known as the “hidden layers” since they are neither input nor output.

Click through for the full tutorial.

Comments closed

Building a Shiny App in R and Python

Nicola Rennie does a language throw-down:

Shiny is an R package that makes it easier to build interactive web apps straight from R. Back in July 2022 at rstudio::conf(2022), Posit (formerly RStudio) announced the release of Shiny for Python. As someone who knows Python but hasn’t written any Python code for quite a long time, I wanted to see how the two compared. So I did the only logical thing and built a Shiny app – twice!

After building (almost) identical Shiny apps, with one built solely in R and the other solely in Python, I’ve written this blog post to take you through some of the things that are the same, and a few things that are slightly different.

Note: at the time of writing Shiny for Python is still in alpha, so if you’re reading this blog quite a while after it was first published, some things may have changed.

The code, as you’d expect, looks quite similar. I also learned about plotnine, something I’ll need to keep in mind. H/T R-Bloggers.

Comments closed

Plotly Visualizations in Azure Data Explorer

Adi Eldar improves ADX visualization:

Azure Data Explorer (ADX) supports various types of data visualizations including time, bar and scatter charts, maps, funnels and many more. The chosen visualization can be specified as part of the KQL query using ‘render’ operator, or interactively selected when building ADX dashboards. Today we extend the set of visualizations, supporting advanced interactive visualizations by Plotly graphics library. Plotly supports ~80 chart types including basic charts, scientific, statistical, financial, maps, 3D, animations and more. There are two methods for creating Plotly visuals:

Read on to learn more about those two methods.

Comments closed

K-Fold Cross-Validation in Python

Shanthababu Pandian gives us a primer on k-fold cross-validation:

In each set (fold) training and the test would be performed precisely once during this entire process. It helps us to avoid overfitting. As we know when a model is trained using all of the data in a single shot and gives the best performance accuracy. Resisting this k-fold cross-validation helps us to build the model as a generalized one.

To achieve this K-Fold Cross Validation, we have to split the data set into three sets, Training, Testing, and Validation, with the challenge of the volume of the data.

Read on for the explanation and an example.

Comments closed

Using the Softmax Classifier in PyTorch

Muhammad Asad Iqbal Khan takes us through one of the classifier options available to PyTorch:

While a logistic regression classifier is used for binary class classification, softmax classifier is a supervised learning algorithm which is mostly used when multiple classes are involved.

Softmax classifier works by assigning a probability distribution to each class. The probability distribution of the class with the highest probability is normalized to 1, and all other probabilities are scaled accordingly.

Read on to learn some of the properties of the Softmax classifier, as well as how you can use this for multi-class classification in PyTorch.

Comments closed

Trying out FLAML

Gavita Regunath provides an overview of FLAML:

FLAML is short for Fast and Lightweight Automated Machine Learning library. It is an open-source Python library created by Microsoft researchers in 2021 for automated machine learning (AutoML). It is designed to be fast, efficient, and user-friendly, making it ideal for a wide range of applications.

Click through to learn more and to give it a spin with a pair of notebooks.

Comments closed

Statistical Analysis in Azure ML

Tomaz Kastrun continues an advent of Azure ML. Day 18 takes us through feature exploration:

Azure Machine Learning is also a great tool to do ordinary statistical analysis, graph plotting and everything that goes along.

Let’s get an open dataset, that is available on UCI Machine Learning repository and import it in the pandas dataframe.

Day 19 picks up with feature engineering:

Yesterday we have shown, that statistical analysis and all bolts and whistles can be done super simple in Azure machine learning. Today we will continue with feature engineering and modelling.

So, what is feature engineering? Is a general process and can involve both feature construction: adding new features from the existing data, and feature selection: choosing only the most important features for improving model performance, reducing data dimensionality, doing log-transformation, removing outliers, to do scaling (normalisation, standardisation), imputations, general transformation (and others, as polynomial), variable creation, variable extraction and so on.

Comments closed

Sharing Results between Notebooks with MSSparkUtils

Liliam Leme provides an answer to a common Synapse Spark pool question:

I’ve been reviewing customer questions centered around “Have I tried using MSSparkUtils to solve the problem?”

One of the questions asked was how to share results between notebooks. Every time you hit “run” in a notebook, it starts a new Spark cluster which means that each notebook would be using different sessions. Making it impossible to share results between executions of notebooks. MSSparkUtils offers a solution to handle this exact scenario. 

Read on to see what MSSparkUtils is and how it helps in this case.

Comments closed