I wrote this quick-and-dirty script to let me know if I happen to forget to turn off a P15 instance, or if I configure a service with a super-expensive performance tier without realizing. Maxing out your free Azure credits may be depressing enough, but emptying your credit card could really put you in the hurt locker.
So, here’s a Powershell script that warns me before any of this happens. It uses the Azure Consumption API to check how much money we’ve racked up on a subscription so far, and if any single instance exceeds, say, 50% of that total cost, it sends a notification to a Slack channel.
The wallet you save may be your own.
One of my favorite recent examples was a company who came to me saying, “We’re spending about $2M per year in the cloud just on our databases alone. Can you help us reduce those costs?” Absolutely: with just a couple of days spent query & index tuning, we chopped their biggest database expenses in half while increasing performance.
At the end of that engagement, the CTO told me, “I thought I’d save money in the cloud by not having a DBA, but what I’m learning is that in the cloud, I actually get a return on my DBA investments.”
I completely agree with this post. The exact tools DBAs use will change, but the role will still be around decades from now. And that’s at the companies which move quickly.
In this post you will see some recommended tools and best practices that you should apply while doing performance comparison. The recommended performance comparison process has three stages:
1. Compare the environment settings on SQL Server and Managed Instance.
2. Create performance baseline on source SQL Server
3. Compare performance on Managed Instance with the baseline
In the following sections will be described the best practices and the recommended approaches
This is a good bit more involved than installing some product, clicking a few buttons, and comparing numbers.
Today, we are launching Quantum, a high-performance serverless SQL engine, available on Qubole Data Platform, that simplifies SQL access by offering a true serverless deployment option to enable data analysts to query petabyte-scale volumes of data using ANSI-SQL.
Quantum allows teams to realize value from their data much more quickly, and because of its serverless nature, users pay only for queries they run. Data analysts can query object stores on AWS, Azure, Google Cloud, and Oracle Cloud in seconds to achieve faster time to market with far less IT management overhead.
Existing serverless SQL service offerings do not provide users with the ability to use a metastore of their choice. With Quantum, data teams can use their own custom metastore and start using Quantum without recreating schemas or table metadata.
Most existing Qubole customers already use a custom metastore in the cloud. So there’s virtually no ramp up time to reap the benefits of Quantum.
The technical overview is a bit too much marketing for my tastes, but this is a move worth watching.
Currently, there are 3 data types supported in ADF variables: String, Boolean, and Array. The first two are pretty easy to use: Boolean for logical binary results and String for everything else, including the numbers (no wonder there are so many conversion functions in Azure Data Factory that we can use).
I’ve also blogged about using Variables in Azure Data Factory:
– Setting Variables in Azure Data Factory Pipelines
– Append Variable activity in Azure Data Factory: Story of combining things together
– System Variables in Azure Data Factory: Your Everyday Toolbox
– Azure Data Factory: Extracting array first element
Click through for arrays and follow up with those other posts from there.
Role-based access control (RBAC) is available for Azure CosmosDB. By using RBAC, you can manage who has access to CosmosDB resources. You need to have a profile in Azure Active Directory to assign RBAC roles to users, groups or other targets. There are 4 built-in roles you can use. Microsoft announced the CosmosDB Operator
Click through for that list.
Azure Resource Manager, or ARM, “allows you to provision your applications using a declarative template.” So says the Azure Quickstart Templates page. ARM templates are JSON and allow administrators to import and export Azure resources using varying management patterns. I really like ARM templates for implementing infrastructure as code in Azure. In this post I show a very simple example of how to use ARM templates to export and then import a basic ADF (Azure Data Factory) pipeline.
The sample code doesn’t do that much by itself, but it does open up a new world of automation.
We are excited to announce the open source release of Gluon Time Series (GluonTS), a Python toolkit developed by Amazon scientists for building, evaluating, and comparing deep learning–based time series models. GluonTS is based on the Gluon interface to Apache MXNet and provides components that make building time series models simple and efficient.
In this post, I describe the key functionality of the toolkit and demonstrate how to apply GluonTS to a time series forecasting problem.
It looks interesting.
So, what is Azure Databricks? To answer this question, let’s start all the way at the bottom of the hole and climb up. So, what is Hadoop? Apache Hadoop is an open-source, distributed storage and computing ecosystem designed to handle incredibly large volumes of data and complex transformations. It is becoming more common as organizations are starting to integrate massive data sources, such as social media, financial transactions and the Internet of Things. However, Hadoop solutions are extremely complex to manage and develop. So, many people have worked together to create platforms that layer on top of Hadoop to provide a simpler way to solve certain types of problems. Apache Spark is one of these platforms. You can read more about Apache Hadoop here and here.
It’s Hadoop turtles all the way down.
One of the useful features of EMR Notebooks is the separation of the notebook environment from your underlying cluster infrastructure. The separation makes it easy for you to execute notebook code against transient clusters without worrying about deploying or configuring your notebook infrastructure every time you bring up a new cluster. You can create multiple serverless notebooks from the AWS Management Console for EMR and access the notebook UI without spending time setting up SSH access or configuring your browser for port-forwarding. Each notebook you create is launched instantly with its own Spark context. This capability enables you to attach multiple notebooks to a single shared cluster and submit parallel jobs without fear of job conflicts in a multi-tenant environment. This way you make efficient use of your clusters.
You can also connect EMR Notebooks to an EMR cluster as small as a one node. This gives you a budget-friendly sandbox environment to develop your Spark application.
Notebooks are everywhere. And for good reason.