Press "Enter" to skip to content

Category: Python

Importing SQL Server Extended Properties into Azure Purview

Daniel Janik shows how you can use PyApacheAtlas to move specific SQL Server extended properties into Azure Purview:

This post is going to be restricted to only SQL Server Table Columns and only Extended Properties named MS_Description. Quite a few years ago I worked on a data catalog project where we added descriptions for many of the tables, views, and columns to the database using extended properties named MS_Description. Let’s assume you have some of these for this post keeping in mind that the Purview APIs provide so many functions beyond what this post covers and that the code here could be modified to do so much more as well.

Starting out I thought it would be great to import the sensitivity classifications that SSMS creates. Pre-SQL 2019 these were held in Extended Properties and now have their very own DMV (sys.sensitivity_classifications). While this sounded great in theory it wasn’t as exciting when I wrote the code. This is because Azure Purview already has system classifications at a more granular scale for each of the ones you find in SSMS and Purview also adds these as it executes a scan on the data source. It does a pretty good job too. With that said, I shifted my focus to adding descriptions instead.

Read on to see how you can do this.

Comments closed

Orchestrating ML Pipelines with Amazon Managed Workflows for Airflow

Juston Leto, et al, show off MLOps capabilities in AWS:

The ability to scale machine learning operations (MLOps) at an enterprise is quickly becoming a competitive advantage in the modern economy. When firms started dabbling in ML, only the highest priority use cases were the focus. Businesses are now demanding more from ML practitioners: more intelligent features, delivered faster, and continually maintained over time. An effective MLOps strategy requires a unified platform that can orchestrate and automate complex data processing and ML tasks, and integrates with the latest tooling to best complete those tasks.

This post demonstrates the value of using Amazon Managed Workflows for Apache Airflow (Amazon MWAA) to orchestrate an ML pipeline using the popular XGBoost (eXtreme Gradient Boosting) algorithm. For more advanced and comprehensive MLOps capabilities, including a purpose-built model orchestration framework and a continuous integration and continuous delivery (CI/CD) service for ML, readers are encouraged to check out Amazon SageMaker Pipelines.

Read on for a step-by-step tutorial on the process.

Comments closed

Explaining an ML Model with SHAP

Dan Lantos, et al, walk us through one technique for model explainability:

Interpretability has to do with how accurately a machine learning model can associate a cause (input) to an effect (output). 

Explainability on the other hand is the extent to which the internal mechanics of a machine or deep learning system can be explained in human terms. Or to put it simply, explainability is the ability to explain what is happening. 

Let’s consider a simple example illustrated below where the goal of the machine learning model is to classify an animal into its respective groups. We use an image of a butterfly as input into the machine learning model. The model would classify the butterfly as either an insect, mammal, fish, reptile or bird. Typically, most complex machine learning models would provide a classification without explaining how the features contributed to the result. However, using tools that help with explainability, we can overcome this limitation. We can then understand what particular features of the butterfly contributed to it being classified as an insect. Since the butterfly has six legs, it is thus classified as an insect.

Being able to provide a rationale behind a model’s prediction would give the users (and the developers) confidence about the validity of the model’s decision.

Read on to see how you can use a library called SHAP in Python to help with this explainability.

Comments closed

Reinforcement Learning and Python 3

I have a new post up:

I finally got around to trying out a reinforcement learning exercise this weekend in an attempt to learn about the technique. One of the most interesting blog posts I read is Andrej Karpathy’s post on using reinforcement learning to play Pong on the Atari 2600. In it, Andrej uses the Gym package in Python to play the game.

This won’t be a post diving into the details of how reinforcement learning works; Andrej does that far better than I possibly could, so read the post. Instead, the purpose of this post is to provide a minor update to Andrej’s code to switch it from Python 2 to Python 3. In doing this, I went with the most convenient answer over a potentially better solution (e.g., switching xrange() to range() rather then re-working the code), but it does work. I also bumped up the learning rate a little bit to pick up the pace a bit.

Click through for the (slightly) updated code.

Comments closed

Contrasting Scala and Python wrt Spark

Sanjay Rathore contrasts two of the three key Apache Spark languages:

Imagine the first day of a new Apache Spark project. The project manager looks at the team and says: which one to choose, scala or python. So let’s start with “scala vs python for spark”. 

You may wonder if this is a tricky question. What does the enterprise demand say? Is this like asking iOS or Android? Is there a right or wrong answer?

So we are here to inform and provide clarity. Today we’re looking at two popular programming languages, Scala and Python, and comparing them in the context of Apache Spark and Big Data in general.

Read on for the comparison. I’m at a point where I think it’s wise to know both languages and roll with whichever is there. If you’re in a greenfield Spark implementation, pick the one you (or your team) is more comfortable with. If you’re equally comfortable with the two, pick Scala because it’s a functional programming language and those are neat.

Comments closed

Time Series Estimation with Facebook’s Prophet

Dan Lantos looks at the Prophet library:

This article (part of a short series) aims to introduce the Prophet library, discuss it at a high level and run through a basic example of forecasting the FTSE 100 index. Future articles will discuss exactly how Prophet achieves its results, how to interpret the output and how to improve the model.
Please see this article (by my talented colleague Gavita) for an introduction to time-series forecasting algorithms.

Click through for part one in an ongoing series.

Comments closed

Optimizing BERT Models on Google Colab

Kevin Jacobs fine-tunes some NLP processes:

BERT is a language model and can thus be used for predicting the next word in a sentence. Furthermore, BERT can be used for automatic summarization, text classification and many more downstream tasks. Google Colab provides you with a cloud-based environment on which you can train your machine learning models on a GPU. The downside is that your data is uploaded to the Google cloud. Google Colab gives you the opportunity to finetune BERT.

Click through to see how.

Comments closed

What’s New in data_algebra

John Mount has an update on a Python package:

The data algebra is a modern realization of elements of Codd’s 1969 relational model for data wrangling (see also Codd’s 12 rules).

The idea is: most data manipulation tasks can usefully be broken down into a small number of fundamental data transforms plus composition. In Codd’s initial writeup, composition was expressed using standard mathematical operator notation. For “modern” realizations one wants to use a composition notation that is natural for the language you are working in. For Python the natural composition notation is method dispatch.

Click through to see how it works and what’s new in the latest version.

Comments closed

Hosting a Python API with Flask

Mrinal Walia shows how you can build a Python API, such as one for generating machine learning predictions, using Flask:

Deployment is a crucial move in the ML workflow. It is a mark where we want to implement our ML model into utilization. Later, we can practice the model in practical life.

But how can we design the model as a treatment? We can develop an Application Programming Interface (API). With that, we can reach the model universally, can be a mobile application or web application. In Python, there’s a library that can assist us in building an API. It’s named Flask.

This article will explain how to construct a REST API for our machine learning model utilizing Flask. Without further ado, let’s begun!

Flask is the first step, but then I’d want to reverse proxy it with gunicorn or Nginx afterward.

Comments closed