Category: Cloud

I discovered Jupiter notebooks not that long ago, but the more I use them the more I see how powerful they could be. For those of you who are not familiar whit Jupiter Notebook: It is an open-source web application where you can combine code, output, visualizations and explanatory text all in one document allowing you to write a code that tells a story. Now that you have an idea of what Jupiter notebook is I will walk you through how you can use it in Azure Machine Learning Studio.

Click through for the process. One advantage to notebooks in an environment like Azure ML over Azure Data Studio is that you have a much wider variety of languages, although Azure Data Studio has a SQL Server kernel, which other platforms currently do not have.

Comments closed

HammerDB CLI for Oracle Running on Azure

Published 2021-04-09 by Kevin Feasel

Kellyn Pot’vin-Gorman goes through a rough experience:

Disclaimer: I’m not a big fan of benchmark data. I find it doesn’t provide us as much value in the real world as we’d like to think it does. As Cary Milsap says, “You can’t hardware your way out of a software problem” and I find that many folks think that if they just get the fastest hardware, their software problems will go away and this just isn’t true. Sooner or later, it’s going to catch up with you- and it rarely tells you what your real database workload needs to run most efficiently or what might be running in your database that could easily be optimized to save the business time and money.
The second issue is that when comparing different workloads or even worse, different platforms or applications, using the same configuration can be detrimental to the benchmarks collected, which is what we’ll discover in this post.

That said, Kellyn dives into the problem and documents several of the issues in building out this test.

Comments closed

Deploying an Azure Arc Enabled Data Services Controller

Published 2021-04-06 by Kevin Feasel

Chris Adkin continues a series:

If you have been following this series, you should have:
– a basic understanding of Terraform
– a Kubernetes cluster that you can connect to using kubectl
– a basic understanding of Kubernetes services
– a working metalLB load balancer
– a basic understanding of how storage works in the world of Kubernetes
– a Kubernetes storage solution in the form of PX Store, alternatively you can use any solution (for the purposes of this series) which supports persistent volumes, however to use the backup solution in part 9 of the series you will need to use something that supports CSI

From here, Chris explains the importance of the data controller and then deploys one.

Comments closed

Auto-Failover Groups and Grace Periods

Published 2021-04-02 by Kevin Feasel

Taiob Ali clears up some misunderstanding:

The auto-failover groups feature for the Azure SQL database can be configured with an automatic failover policy. Azure triggers failover after the failure is detected and the grace period has expired. Grace period is determined by a setting called ‘GracePeriodWithDataLossHours’ that cannot be set under one hour. Why is it not allowed to set a time which is less than an hour? Can your business tolerate the application be down for that period? Should your turn off Auto Fail-over and set it to manual?
I noticed a lot of confusion around this setting, including my own. Some of the confusion is due to a lack of clarity in the documentation. I checked with the Microsoft Azure SQL team, and they are actively working on clarifying some of the questions I raised.
I want to thank Dimitri Furman and Roberto Bustos from the Azure SQL Team for clarifying some of my confusion that I will share here.

Read on for a Q&A style explanation of auto-failover and grace periods.

Comments closed

Retrieving Counts of Cosmos DB Collections

Published 2021-04-01 by Kevin Feasel

Manoj Pandey shows how you can retrieve counts of records in Cosmos DB using the .NET client:

Here in this post we will use C# .net code (for beginners like me) to see how to:
1. Connect to a Cosmos DB instance
2. Get list of all Databases in a Cosmos DB
3. Iterate through all the Databases and get the list of all Collections (or Tables)
4. Get COUNT of all documents/items (or records) in these Collections

Click through to see how.

Comments closed

Using Azure Functions to Tag Resources

Published 2021-04-01 by Kevin Feasel

Jess Pomfret shows off an interesting way of using Azure Functions to apply tags to resources:

In part one I discussed how useful Azure tags can be, and specifically about how adding a ‘dateCreated’ tag can help you keep track of your resources, and how to find resources with certain tags using PowerShell. Part 2 and 3 are based around the fact that adding the ‘dateCreated’ tag is a great idea, but relying on a human to remember to add it is less than ideal. In part 2 we looked at using Azure Policy to automatically add the tag. Today’s post will cover another option using Azure Functions.
Azure Functions gives us a way of running serverless code, written in a number of different languages, triggered by specific events or timings. Looking through the documentation there are many use cases from processing files to analysing IoT workstreams. Our use case is to run a PowerShell script that tags any resources that are missing the ‘dateCreated’.

Click through to see how.

Comments closed

Hyperparameter Tuning in Azure Machine Learning

Published 2021-03-30 by Kevin Feasel

Dinesh Asanka takes us through hyperparameter tuning with Azure Machine Learning’s designer:

In the above experiment, both the previous model and the TMH included the model so that we can compare both models. In the above experiment, Tune Model Hyperparameters control is inserted between the Split Data and Score Model controls as shown. In the TMH, control has three inputs. The first control needs the relevant technique and, in this scenario, it is the Two-Class Logistic Regression technique. The second input needs the train data set and the last input needs the evaluation data set and for that, the test data set can be used.
Tune Model Hyperparameters control provides the best combinations and it will be connected to the score model. After the test data stream is connected to the score model, the output of the model is connected to the second input of the Evaluate model so that the previous model and the tuned model can be compared.

I’m not sure if there’s something handled internally in the Tune Model Hyperparameters component, but based on the pipeline images, I’d actually want two separate Split components so that I ended up with something more like 50-20-30 for training, hyperparameter testing, and validation. The first two pipelines appear to be 70-30-0 instead, and so can give you a false sense of confidence in model quality.

Comments closed

Cost versus Performance Optimization for SQL Server on VMs in Azure

Published 2021-03-30 by Kevin Feasel

Pam Lahoud takes a look at multi-constraint optimization:

So how do you get the best price-performance possible when configuring your SQL Server on Azure VM? In this blog, we’re going to cover three key aspects to right-sizing (and right-configuring) your Azure VM for SQL Server that are based on some common pitfalls customers face when migrating their on-premises workloads to Azure VM:
– Choosing the best VM series and size for your workload
– Configuring storage for maximum throughput and lower cost
– Leveraging unique to Azure features such as host caching to boost performance at no additional cost

One key point of the article is that there are several factors which can make a big difference in price and performance, but which you might not think about on-premises. It’s definitely worth taking the time to research this. It’s also a great example of how administrators are still important in a cloud-based world—having an admin who understands these settings and can get the most out of a given server can save a lot of money very quickly.

Comments closed

Limitations with Control Flows in Azure Data Factory

Published 2021-03-29 by Kevin Feasel

Meagan Longoria has a list:

If you’ve been using Azure Data Factory for a while, you might have hit some limitations that don’t exist in tools like SSIS or Databricks. Knowing these limitations up front can help you design better pipelines, so I’m listing a few here of which you’ll want to be aware.
1. You cannot nest For Each activities.
Within a pipeline, you cannot place a For Each activity inside of another For Each activity. If you need to iterate through two datasets you have two main options. You can combine the two datasets before you iterate over them. Or you can use a parent/child pipeline design where you move the inner For Each activity into the child pipeline. Fun fact: currently the Data Factory UI won’t stop you from nesting For Each activities. You won’t find out until you try to execute the pipeline.

Click through for several other limitations and workarounds.

Comments closed

Drift Monitoring with Azure Machine Learning

Published 2021-03-24 by Kevin Feasel

I take a look at dataset drift monitoring in Azure Machine Learning:

One of the things I like to say about machine learning model is, “shift happens.” By that, I mean that models lose effectiveness over time due to changes in underlying circumstances. Relationships between variables that used to hold no longer do, and so our model quality degrades. This means that we sometimes need to retrain models.
But there’s a cost to retraining models—that work can be computationally expensive and time-consuming. This concern is particularly salient if you’re in a cloud, as you pay directly for everything there. This means that we don’t want to retrain models unless we need to. But when do we know if we should retrain the model? We can watch for model degradation, but there’s another method: drift detection in your datasets.

Read on for a demonstration of how the product works and a couple of things to keep in mind.

Comments closed

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31