Cloud – Page 51 – Curated SQL

Moving Stack Overflow to Azure

Published 2022-12-28 by Kevin Feasel

Aaron Bertrand gets into the whats and wherefores:

Like many companies, Stack Overflow is trying to get out of the business of running our architecture in our own data centers; instead, we want to offload some of the more mundane parts of system administration to a cloud service offering like Azure.

I’m going to cut to the chase for the purpose of this article and concede we’ve already decided on Azure for the majority of our infrastructure and, most importantly to me, our databases.

Click through to learn what their plan is and why Aaron & co went that particular route.

Comments closed

Wrapping Up the Advent of Azure ML

Published 2022-12-27 by Kevin Feasel

Tomaz Kastrun shares additional resources:

If you want to immerse in further reading and additional knowledge, here are some links. Here are just couple.

That’ll do it. With 24 posts in 24 days, Tomaz wraps up an excellent advent series. There’s a lot more to Azure ML but this was a great overview of AML if you’re new to the product.

Comments closed

Script Activity Outputs to ForEach Inputs with ADF

Published 2022-12-23 by Kevin Feasel

Meagan Longoria links in a script:

In early 2022, Microsoft released a new activity in Azure Data Factory (ADF) called the Script activity. The Script activity allows you to execute one or more SQL statements and receive zero, one, or multiple result sets as the output. This is an advantage over the stored procedure activity that was already available in ADF, as the stored procedure activity doesn’t support using the result set returned from a query in a downstream activity.

However, when I went to find examples of how to reference those result sets, the documentation was lacking.

Click through as Meagan corrects a gap in documentation.

Comments closed

Batch Endpoints in Azure ML

Published 2022-12-22 by Kevin Feasel

Tomaz Kastrun is winding down an advent of Azure ML. Day 22 covers batch scoring:

Batch endpoints are a great and simple way to run inference over large volumes of data. They simplify the process of hosting your models for batch scoring.

Click through to see how it all works.

Comments closed

Structuring Azure ML Projects and using the Terminal

Published 2022-12-21 by Kevin Feasel

Tomaz Kastrun nears the end of the Azure ML advent. Day 20 covers package requirements and other niceties:

When creating notebooks, it is always a good way to have the dependencies included. Whether it is a particular version of a package, a separate script file or an installation requirement.

Selecting an environment or kernel can be an issue if it is not correctly initiated with the code. And you can also check the kernels with a simple python code:

Day 21 looks at the Azure CLI and running code from within a compute instance terminal:

Using Azure CLI can help you progress faster, make repetitve tasks automated and even use the GIT integration, for faster and better collaboration.

So we have created a YAML file on Day20 and we can use it also with Azure CLI to create an environment.

Comments closed

Capturing Event Hubs Data in Delta Lake Format with Stream Analytics

Published 2022-12-21 by Kevin Feasel

Xu Jiang announces a public preview:

The Stream Analytics no-code editor is a drag and drop design tool that helps customers to develop the Stream Analytics jobs without writing a single line of code. The experience provides a canvas that allows you to connect to input sources to quickly see your streaming data. Then you can transform and preview it before writing to your destination of choice in Azure. To learn more, see No-code stream processing through Azure Stream Analytics | Microsoft Learn.

Read on to see how you can capture and process data into Delta Lake format via their designer.

Comments closed

Statistical Analysis in Azure ML

Published 2022-12-20 by Kevin Feasel

Tomaz Kastrun continues an advent of Azure ML. Day 18 takes us through feature exploration:

Azure Machine Learning is also a great tool to do ordinary statistical analysis, graph plotting and everything that goes along.

Let’s get an open dataset, that is available on UCI Machine Learning repository and import it in the pandas dataframe.

Day 19 picks up with feature engineering:

Yesterday we have shown, that statistical analysis and all bolts and whistles can be done super simple in Azure machine learning. Today we will continue with feature engineering and modelling.

So, what is feature engineering? Is a general process and can involve both feature construction: adding new features from the existing data, and feature selection: choosing only the most important features for improving model performance, reducing data dimensionality, doing log-transformation, removing outliers, to do scaling (normalisation, standardisation), imputations, general transformation (and others, as polynomial), variable creation, variable extraction and so on.

Comments closed

Azure SQL Database Performance Roundup

Published 2022-12-20 by Kevin Feasel

Reitse Eskens shares the goods:

In the past 9 blogs, I’ve shown you all sorts of Azure SQL database solutions and gave them a little run for their money. I’ve tested a lot and written about them. This blog will be about the summation of the data and my views on the combined graphs. At the end I’ll wrap it up with my way of working when a new project starts.

But before I kick off, a little Christmas present. What I didn’t do, until now, is give you access to more raw data. Now is the moment to give you more raw number to play around with for yourself and do your own analysis. Fun as it might be, I’d highly encourage you to use my sheets as a jumping point and adapt them for your own workloads. You can find the two Excel files via the link for the scripts.

This is a post I’d been waiting for, as it covers the comparisons between tiers directly, rather than inferring it from the various posts.

Comments closed

MLflow in Action and Responsible AI

Published 2022-12-19 by Kevin Feasel

Tomaz Kastrun continues an advent of Azure ML. Day 16 shows off MLflow:

Yesterday we have looked into how to start the MLflow configurations and today, let’s put this to the test.

We will create a new notebook and use Heart dataset (link to dataset) to toy around. We will also import xgboost classifier to asses the accuracy of the presence of heart disease in the patient. We will be using a categorical (integer) variable with values from 0 (no presence) to 4 (strong presence) and attempt to classify based on 15+ attributes (out of more than 70 attributes).

Day 17 pivots to using the responsible AI dashboard:

Azure ML has provided users with collection of model and data exploration with the Studio user interface. But it also provides compatible solutions with Azure ML and Python package responsibleai. With the help of widgets, we will create an sample of dashboard to explore the solution with assessing the responsible decisions and actions.

Comments closed

AutoML and Model Registration in AML

Published 2022-12-14 by Kevin Feasel

Tomaz Kastrun continues an advent of Azure Machine Learning. Day 13 covers the topic of Automated ML:

Automated ML is a no-code automated machine learning task. It iterates over many combinations of algorithms and hyperparameters in order to find the best model for your dataset and your prediction variable(s). The final solution is a model, that can be downloaded and later reused. So Automated ML is not just giving you the best model out of a family of algorithms, but lets you use the model, generate the scripts and create the artefacts.

Day 14 concerns model registration:

Important asset is the “Models” in navigation bar. This feature allows you to work with different model types -> custom, MLflow, and Triton. What you do here is, you register a model from different locations (e.g.: local file, AzureML Datastore, AzureML Job, MLflow Job, Model asset in AzureML workspace, and Model asset in AzureML Registry).

Once you open the Models asset, you will see, that you can do many things here. I have already model register from the running the notebook on day4.

Comments closed

Category: Cloud