Category: Python

AML Environments and SDKs

Published 2022-12-09 by Kevin Feasel

Tomaz Kastrun continues an advent of Azure ML. First up is environments:

We have explored how to create a compute instance and compute target and learned that ML frameworks and scripting packages always come preinstalled.

Choosing the right set of components (CPU, GPU, RAM, Core) and corresponding software (OS, ML Framework, packages) can be time-consuming.

Under Curated environments, you will find predefined environments, with settings for running particular frameworks, like PyTorch or TensorFlow.

Then an overview of the Azure CLI and Python SDK for AML:

What is Azure CLI? It is an Azure Command Line, a great tool for running commands out of CMD. It is a multi-platform and can be run from Azure or from the client’s machine. It is great for scripting and automating repetitive tasks or making the complex task look like lines of code, especially when it comes to infrastructure, managing, provisioning and monitoring. It can also be run from Azure Cloud Shell. It is native to Azure and can be used across all the services and offerings. Usually, the Azure CLI commands start with “az ..”. On top of that, you can also install Azure Machine Learning CLI, as an extension to Azure CLI. The AML CLI will give you additional commands to manage resources for machine learning.

The same functionality (to some extent) in Azure Machine Learning can be achieved with Python SDK. In addition to that, it offers also great ways to create and manage resources you use for training and deployment of models.

And, so that we can catch up a bit to Tomaz, one more post covering the Python SDK:

Looking briefly into Azure CLI and Python SDK, let’s explore the power of SDK and the most important namespaces.

Comments closed

Installing ML Services on SQL Server 2022

Published 2022-12-08 by Kevin Feasel

Tomaz Kastrun notices a change to the SQL Server installer:

Machine Learning Services and language extensions is available under Database Engine Services, and if you want to use any of these languages, check this feature. During the installation process, the R, Python or Java will not be installed (nor asked for permissions), but you will install your own runtime after the installation. This will bring you more convenience with the installation of different R/Python/Java runtimes.

Read on to see how you can install and work with languages like R, Python, and Java in SQL Server 2022.

Comments closed

Data Lake Exploration in AWS with Athena for Spark

Published 2022-12-02 by Kevin Feasel

Pathik Shah and Raj Devnath jetski the data lake:

Amazon Athena now enables data analysts and data engineers to enjoy the easy-to-use, interactive, serverless experience of Athena with Apache Spark in addition to SQL. You can now use the expressive power of Python and build interactive Apache Spark applications using a simplified notebook experience on the Athena console or through Athena APIs. For interactive Spark applications, you can spend less time waiting and be more productive because Athena instantly starts running applications in less than a second. And because Athena is serverless and fully managed, analysts can run their workloads without worrying about the underlying infrastructure.

Data lakes are a common mechanism to store and analyze data because they allow companies to manage multiple data types from a wide variety of sources, and store this data, structured and unstructured, in a centralized repository. Apache Spark is a popular open-source, distributed processing system optimized for fast analytics workloads against data of any size. It’s often used to explore data lakes to derive insights. For performing interactive data explorations on the data lake, you can now use the instant-on, interactive, and fully managed Apache Spark engine in Athena. It enables you to be more productive and get started quickly, spending almost no time setting up infrastructure and Spark configurations.

In this post, we show how you can use Athena for Apache Spark to explore and derive insights from your data lake hosted on Amazon Simple Storage Service (Amazon S3).

This feels a lot like the Spark pool in Azure Synapse Analytics, as well as some of what Databricks does

Comments closed

Extracting JSON from a Spark DataFrame

Published 2022-11-29 by Kevin Feasel

Unmesha Sreeveni digs into some JSON:

Let’s see how we can extract a Json object from a spark dataframe column

This is an example data frame

Unemsha takes it one step at a time, breaking down each element of the DataFrame and showing how it all works.

Comments closed

Applying Functions to DataFrames in Pandas

Published 2022-11-15 by Kevin Feasel

Matt Eland shows off the apply() function in Pandas:

Pandas is a wonderful library for manipulating tabular data with Python. Out of the box Pandas offers many ways of adding, removing, and updating columns and rows, but sometimes you need a bit more power.

In this article we’ll explore the apply function and show how it can be used to run an operation against every row (or column) in your DataFrame – and why you might want to do that.

Read on to see how it works and what additional benefit it provides.

Comments closed

Sending Messages to Event Hub via Python

Published 2022-11-14 by Kevin Feasel

Kiril Nikolov has a message for us:

Recently I needed to create an Azure Function app that would connect to an API and send data to an Event Hub as part of a real-time data streaming solution.

Azure functions are the perfect connectivity option for a task like this, allowing you to focus on the trigger and the resulting output message you want to capture in the event stream, while Azure handles the maintenance of the cloud infrastructure and hosting to run it.

Azure functions can be written in multiple languages. I needed to write mine in python, meaning that I had to set up a configuration file to connect to the Event Hub (as I will explain in further detail below).

Click through to see how it all works.

Comments closed

Using Shiny on Python

Published 2022-11-08 by Kevin Feasel

David Saipe crosses the streams:

As someone who has zero experience using Shiny in R, the recent announcement that the framework had been made available to Python users inspired an opportunity for me to learn a new concept from a different perspective to most of my colleagues. I have been tasked with writing a Python related blog post, and having spent the past few weeks carrying out an analysis of Jumping Rivers’ Twitter data (@jumping_uk), creating a dashboard to display some of my findings and then writing about it seemed like a nice way to cap off my 6-week summer placement at Jumping Rivers.

This post will take you through some of the source code for the dashboard I created, whilst I provide a bit of context for the Twitter project itself. For a more bare-bones tutorial on using Shiny for Python, you can check out another recent Jumping Rivers blog post here. I suggest reading this first.

Read on to see how you can get started with Shiny on Python and what David thinks about the experience.

Comments closed

Removing Backgrounds from Images

Published 2022-11-04 by Kevin Feasel

Brendan Tierney focuses on the subject at hand:

There are a number of methods available for preparing images for input to a variety of purposes. For example, for input to deep learning, other image processing models/applications/systems, etc. But sometimes you just need a quick tool to perform a certain task. An example of this is I regularly have to edit images to extract just a certain part of it, or to filter out all the background colors and/or objects etc. There are a a variety of tools available to help you with this kind of task. For me, I’m a Mac user, so I use the instant alpha feature available in some of the Mac products. But what if you are not a Mac user, what can you use.
I’ve recently come across a very useful Python library that takes all or most of the hard work out of doing such tasks, and has proved to be extremely useful for some demos and projects I’ve been working on. The Python library I’m using is remgb (Remove Background). It isn’t perfect, but it does a pretty good job and only in a small number of modified images, did I need to do some additional processing.

Click through to see how the tool works, as well as some cases it doesn’t quite get correct.

Comments closed

Installing Third-Party WHL Packages in Synapse with DEP

Published 2022-11-02 by Kevin Feasel

Sabyasachi Samaddar walks through what I consider a real difficulty:

It is really challenging when you need to install third-party .whl packages into a DEP-enabled Azure Synapse Spark Instance.
According to the documentation, https://learn.microsoft.com/en-us/azure/synapse-analytics/spark/apache-spark-azure-portal-add-librar… Installing packages from PyPI is not supported within DEP-enabled workspaces. Hence we cannot just upload the .whl packages into the workspace. We need to upload all the dependencies along with the .whl package and it will be an offline installation. Now Synapse spark clusters come with in-built packages and hence we may find some conflicts when we try to install some third-party packages.

Read on to see what you need to do.

Comments closed

Appending Rows to a Pandas DataFrame

Published 2022-10-31 by Kevin Feasel

Matt Eland acquires some rows that fell off a truck:

Recently I was working on comparing the performance of different machine learning models and I wanted to add entries to a Pandas DataFrame as I evaluated each model. What I found was that adding new rows to a Pandas DataFrame was a little harder than I suspected and required some mild searching, so I wanted to preserve the two solutions I found here in case it helps someone else.

Read on for those two solutions, though as Matt points out, only one of them is a good solution.

Comments closed

M	T	W	T	F	S	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31