Python – Page 23 – Curated SQL

Building a Shiny App in R and Python

Published 2023-01-23 by Kevin Feasel

Nicola Rennie does a language throw-down:

Shiny is an R package that makes it easier to build interactive web apps straight from R. Back in July 2022 at rstudio::conf(2022), Posit (formerly RStudio) announced the release of Shiny for Python. As someone who knows Python but hasn’t written any Python code for quite a long time, I wanted to see how the two compared. So I did the only logical thing and built a Shiny app – twice!

After building (almost) identical Shiny apps, with one built solely in R and the other solely in Python, I’ve written this blog post to take you through some of the things that are the same, and a few things that are slightly different.

Note: at the time of writing Shiny for Python is still in alpha, so if you’re reading this blog quite a while after it was first published, some things may have changed.

The code, as you’d expect, looks quite similar. I also learned about plotnine, something I’ll need to keep in mind. H/T R-Bloggers.

Comments closed

Plotly Visualizations in Azure Data Explorer

Published 2023-01-18 by Kevin Feasel

Adi Eldar improves ADX visualization:

Azure Data Explorer (ADX) supports various types of data visualizations including time, bar and scatter charts, maps, funnels and many more. The chosen visualization can be specified as part of the KQL query using ‘render’ operator, or interactively selected when building ADX dashboards. Today we extend the set of visualizations, supporting advanced interactive visualizations by Plotly graphics library. Plotly supports ~80 chart types including basic charts, scientific, statistical, financial, maps, 3D, animations and more. There are two methods for creating Plotly visuals:

Read on to learn more about those two methods.

Comments closed

K-Fold Cross-Validation in Python

Published 2023-01-05 by Kevin Feasel

Shanthababu Pandian gives us a primer on k-fold cross-validation:

In each set (fold) training and the test would be performed precisely once during this entire process. It helps us to avoid overfitting. As we know when a model is trained using all of the data in a single shot and gives the best performance accuracy. Resisting this k-fold cross-validation helps us to build the model as a generalized one.

To achieve this K-Fold Cross Validation, we have to split the data set into three sets, Training, Testing, and Validation, with the challenge of the volume of the data.

Read on for the explanation and an example.

Comments closed

Using the Softmax Classifier in PyTorch

Published 2023-01-02 by Kevin Feasel

Muhammad Asad Iqbal Khan takes us through one of the classifier options available to PyTorch:

While a logistic regression classifier is used for binary class classification, softmax classifier is a supervised learning algorithm which is mostly used when multiple classes are involved.

Softmax classifier works by assigning a probability distribution to each class. The probability distribution of the class with the highest probability is normalized to 1, and all other probabilities are scaled accordingly.

Read on to learn some of the properties of the Softmax classifier, as well as how you can use this for multi-class classification in PyTorch.

Comments closed

PyODBC Error Messages

Published 2023-01-02 by Kevin Feasel

Jose Manuel Jurado Diaz collects a compendium of errors:

1) pyodbc.Error: (‘HY000’, ‘[HY000] [Microsoft][ODBC Driver 17 for SQL Server]Connection is busy with results for another command (0) (SQLExecDirectW)’)

This error ocurrs when the Python code is trying to open a new cursor when we have a previous one with results.

Read on for examples of the problem and solutions for each.

Comments closed

Trying out FLAML

Published 2022-12-29 by Kevin Feasel

Gavita Regunath provides an overview of FLAML:

FLAML is short for Fast and Lightweight Automated Machine Learning library. It is an open-source Python library created by Microsoft researchers in 2021 for automated machine learning (AutoML). It is designed to be fast, efficient, and user-friendly, making it ideal for a wide range of applications.

Click through to learn more and to give it a spin with a pair of notebooks.

Comments closed

Statistical Analysis in Azure ML

Published 2022-12-20 by Kevin Feasel

Tomaz Kastrun continues an advent of Azure ML. Day 18 takes us through feature exploration:

Azure Machine Learning is also a great tool to do ordinary statistical analysis, graph plotting and everything that goes along.

Let’s get an open dataset, that is available on UCI Machine Learning repository and import it in the pandas dataframe.

Day 19 picks up with feature engineering:

Yesterday we have shown, that statistical analysis and all bolts and whistles can be done super simple in Azure machine learning. Today we will continue with feature engineering and modelling.

So, what is feature engineering? Is a general process and can involve both feature construction: adding new features from the existing data, and feature selection: choosing only the most important features for improving model performance, reducing data dimensionality, doing log-transformation, removing outliers, to do scaling (normalisation, standardisation), imputations, general transformation (and others, as polynomial), variable creation, variable extraction and so on.

Comments closed

Sharing Results between Notebooks with MSSparkUtils

Published 2022-12-16 by Kevin Feasel

Liliam Leme provides an answer to a common Synapse Spark pool question:

I’ve been reviewing customer questions centered around “Have I tried using MSSparkUtils to solve the problem?”

One of the questions asked was how to share results between notebooks. Every time you hit “run” in a notebook, it starts a new Spark cluster which means that each notebook would be using different sessions. Making it impossible to share results between executions of notebooks. MSSparkUtils offers a solution to handle this exact scenario.

Read on to see what MSSparkUtils is and how it helps in this case.

Comments closed

Pipelines and Jobs in Azure ML

Published 2022-12-13 by Kevin Feasel

Tomaz Kastrun continues an advent on Azure ML. Day 11 covers pipelines:

A pipeline is set of instructions (or a workflow) for executing particular work of a machine learning task. The idea behind pipelines is that will help the team of data scientists and machine learning engineers standardize workflow and incorporate best practices of preparing data, producing training models, executing the models and deploying them. Pipelines will help improve and build workflow efficiently and in such a way that it can be reusable.

And the idea behind it, is to split a machine learning process into smaller tasks, a multistep workflow, where each step is a separate component than can be developed, upgraded, optimised, configured, automated, and deleted separately. And these steps, connected through interfaces, form a workflow.

Day 12 makes us get a job:

An Azure ML job executes a task against a specified compute target. This is also how the job is created. By configuring a new job, you can also scale out model training, since there are single node and distributed training available.

A simple job command would be to execute a command in a Docker container. And further parameter sweeping can be executed, by specifying it in the job itself.

Comments closed

Running Python Code from R via Reticulate

Published 2022-12-12 by Kevin Feasel

Rick Pack crosses the streams:

I wanted a REPL (read-evaluate-print-loop) so that I could quickly experiment with Python without, for the moment, leaping over what some consider one of the biggest hurdles to Python usage: Work environment set up.

The reticulate R package by Posit enables the use of Python while working within the R Studio IDE. One can find a Posit tutorial here.

Read on for Rick’s notes.

Comments closed

Category: Python