Press "Enter" to skip to content

Category: Cloud

Structuring Azure ML Projects and using the Terminal

Tomaz Kastrun nears the end of the Azure ML advent. Day 20 covers package requirements and other niceties:

When creating notebooks, it is always a good way to have the dependencies included. Whether it is a particular version of a package, a separate script file or an installation requirement.

Selecting an environment or kernel can be an issue if it is not correctly initiated with the code. And you can also check the kernels with a simple python code:

Day 21 looks at the Azure CLI and running code from within a compute instance terminal:

Using Azure CLI can help you progress faster, make repetitve tasks automated and even use the GIT integration, for faster and better collaboration.

So we have created a YAML file on Day20 and we can use it also with Azure CLI to create an environment.

Comments closed

Capturing Event Hubs Data in Delta Lake Format with Stream Analytics

Xu Jiang announces a public preview:

The Stream Analytics no-code editor is a drag and drop design tool that helps customers to develop the Stream Analytics jobs without writing a single line of code. The experience provides a canvas that allows you to connect to input sources to quickly see your streaming data. Then you can transform and preview it before writing to your destination of choice in Azure. To learn more, see No-code stream processing through Azure Stream Analytics | Microsoft Learn.

Read on to see how you can capture and process data into Delta Lake format via their designer.

Comments closed

Statistical Analysis in Azure ML

Tomaz Kastrun continues an advent of Azure ML. Day 18 takes us through feature exploration:

Azure Machine Learning is also a great tool to do ordinary statistical analysis, graph plotting and everything that goes along.

Let’s get an open dataset, that is available on UCI Machine Learning repository and import it in the pandas dataframe.

Day 19 picks up with feature engineering:

Yesterday we have shown, that statistical analysis and all bolts and whistles can be done super simple in Azure machine learning. Today we will continue with feature engineering and modelling.

So, what is feature engineering? Is a general process and can involve both feature construction: adding new features from the existing data, and feature selection: choosing only the most important features for improving model performance, reducing data dimensionality, doing log-transformation, removing outliers, to do scaling (normalisation, standardisation), imputations, general transformation (and others, as polynomial), variable creation, variable extraction and so on.

Comments closed

Azure SQL Database Performance Roundup

Reitse Eskens shares the goods:

In the past 9 blogs, I’ve shown you all sorts of Azure SQL database solutions and gave them a little run for their money. I’ve tested a lot and written about them. This blog will be about the summation of the data and my views on the combined graphs. At the end I’ll wrap it up with my way of working when a new project starts.

But before I kick off, a little Christmas present. What I didn’t do, until now, is give you access to more raw data. Now is the moment to give you more raw number to play around with for yourself and do your own analysis. Fun as it might be, I’d highly encourage you to use my sheets as a jumping point and adapt them for your own workloads. You can find the two Excel files via the link for the scripts.

This is a post I’d been waiting for, as it covers the comparisons between tiers directly, rather than inferring it from the various posts.

Comments closed

MLflow in Action and Responsible AI

Tomaz Kastrun continues an advent of Azure ML. Day 16 shows off MLflow:

Yesterday we have looked into how to start the MLflow configurations and today, let’s put this to the test.

We will create a new notebook and use Heart dataset (link to dataset) to toy around. We will also import xgboost classifier to asses the accuracy of the presence of heart disease in the patient. We will be using a categorical (integer) variable with values from 0 (no presence) to 4 (strong presence) and attempt to classify based on 15+ attributes (out of more than 70 attributes).

Day 17 pivots to using the responsible AI dashboard:

Azure ML has provided users with collection of model and data exploration with the Studio user interface. But it also provides compatible solutions with Azure ML and Python package responsibleai. With the help of widgets, we will create an sample of dashboard to explore the solution with assessing the responsible decisions and actions.

Comments closed

AutoML and Model Registration in AML

Tomaz Kastrun continues an advent of Azure Machine Learning. Day 13 covers the topic of Automated ML:

Automated ML is a no-code automated machine learning task. It iterates over many combinations of algorithms and hyperparameters in order to find the best model for your dataset and your prediction variable(s). The final solution is a model, that can be downloaded and later reused. So Automated ML is not just giving you the best model out of a family of algorithms, but lets you use the model, generate the scripts and create the artefacts.

Day 14 concerns model registration:

Important asset is the “Models” in navigation bar. This feature allows you to work with different model types -> custom, MLflow, and Triton. What you do here is, you register a model from different locations (e.g.: local file, AzureML Datastore, AzureML Job, MLflow Job, Model asset in AzureML workspace, and Model asset in AzureML Registry).

Once you open the Models asset, you will see, that you can do many things here. I have already model register from the running the notebook on day4.

Comments closed

Identifying Rows in sp_wait_for_database_copy)_sync

Jose Manuel Jurado Diaz troubleshoots a problem:

As you could see in this public documentation Auto-failover groups overview & best practices – Azure SQL Database | Microsoft Learn about sp_wait_for_database_copy_sync “sp_wait_for_database_copy_sync prevents data loss after geo-failover for specific transactions, but does not guarantee full synchronization for read access. The delay caused by a sp_wait_for_database_copy_sync procedure call can be significant and depends on the size of the not yet transmitted transaction log on the primary at the time of the call.”

Our customer asked about several scenarios to understand this behaviour and also, verify if there is possible to identify the rows that have not been synced. For this, I developed a POC to test it:

Read on to see what you’d need to do.

Comments closed

FHIR and Azure Health Services

Steve Hughes provides an overview of FHIR and what Azure has to offer:

With the recent updated mandates in the healthcare environment in the United States, Microsoft has continued to expand its capability to support the FHIR standard for integrating healthcare data. While the standard is well documented and Microsoft’s capabilities are expansive, it falls on data professionals to interpret that data and build meaningful reports and produce meaningful insights from the data as it is collected and integrated across environments. This requires a good working knowledge of JSON in SQL to manipulate complex data models. In the session, we did a short review of the FHIR standard and the overall implementation of FHIR in Azure. From there we reviewed the resulting data in the data lake and in Synapse. That was followed up with an overview into the heart of complex SQL using JSON functions in Synapse. Whether or not you are active in healthcare today, this will be an enlightening session on how to use JSON SQL functions within the Azure SQL platforms.

Read on to learn more.

Comments closed

Managing Azure Log Analytics Data Access

Bruno Gabrielli wants to limit data access to Log Analytics:

I am back with another important topic arising from my customers’ visits. How can I give very specific access to Log Analytics data, whether they be Security or Monitoring data?

Tricky one, isn’t it? A very simplistic answer could be: “manage your access list through IAM on the workspace”, but this is not enough. Say, for instance, that you would give scoped access to data coming from specific resources or, even more complicated, you would like that given the same resource one team can see some info and another one all the rest.

Looks complicated, but hey … good news: this is doable

Read on to learn how.

Comments closed