Cloud – Page 134 – Curated SQL

Creating an Azure Databricks Cluster

Published 2019-06-25 by Kevin Feasel

Brad Llewellyn shows how you can create an Azure Databricks cluster:

There are three major concepts for us to understand about Azure Databricks, Clusters, Code and Data. We will dig into each of these in due time. For this post, we’re going to talk about Clusters. Clusters are where the work is done. Clusters themselves do not store any code or data. Instead, they operate the physical resources that are used to perform the computations. So, it’s possible (and even advised) to develop code against small development clusters, then leverage the same code against larger production-grade clusters for deployment. Let’s start by creating a small cluster.

Read on for an example.

Comments closed

Databricks Runtime 5.4

Published 2019-06-24 by Kevin Feasel

Todd Greenstein announces Databricks Runtime 5.4:

We’ve partnered with the Data Services team at Amazon to bring the Glue Catalog to Databricks. Databricks Runtime can now use Glue as a drop-in replacement for the Hive metastore. This provides several immediate benefits:
– Simplifies manageability by using the same glue catalog across multiple Databricks workspaces.
– Simplifies integrated security by using IAM Role Passthrough for metadata in Glue.
– Provides easier access to metadata across the Amazon stack and access to data catalogued in Glue.

There are some interesting changes in here.

Comments closed

Building an AKS Cluster

Published 2019-06-24 by Kevin Feasel

Mohammad Darab continues a series on Big Data Clusters by creating a Kubernetes pod in Azure Kubernetes Service:

Next, we will create a resource group by executing the following command:
az group create –name nameOfMyresourceGroup –location eastus2
Once you execute the above command, you can go into the Azure portal and refresh your resource group pane and see the newly created resource group.
Once that is setup, it’s time to create the actual Kubernetes cluster.

Click through for the full set of instructions.

Comments closed

Warning on Azure Consumption

Published 2019-06-20 by Kevin Feasel

Daniel Hutmacher doesn’t want you to have any Azure billing surprises:

I wrote this quick-and-dirty script to let me know if I happen to forget to turn off a P15 instance, or if I configure a service with a super-expensive performance tier without realizing. Maxing out your free Azure credits may be depressing enough, but emptying your credit card could really put you in the hurt locker.
So, here’s a Powershell script that warns me before any of this happens. It uses the Azure Consumption API to check how much money we’ve racked up on a subscription so far, and if any single instance exceeds, say, 50% of that total cost, it sends a notification to a Slack channel.

The wallet you save may be your own.

Comments closed

DBAs in the Cloud

Published 2019-06-12 by Kevin Feasel

Brent Ozar argues that production DBAs will still be important even at cloud-only companies:

One of my favorite recent examples was a company who came to me saying, “We’re spending about $2M per year in the cloud just on our databases alone. Can you help us reduce those costs?” Absolutely: with just a couple of days spent query & index tuning, we chopped their biggest database expenses in half while increasing performance.
At the end of that engagement, the CTO told me, “I thought I’d save money in the cloud by not having a DBA, but what I’m learning is that in the cloud, I actually get a return on my DBA investments.”

I completely agree with this post. The exact tools DBAs use will change, but the role will still be around decades from now. And that’s at the companies which move quickly.

Comments closed

Comparing On-Prem To Managed Instance Performance

Published 2019-06-12 by Kevin Feasel

Jovan Popovic has an article explaining how you can compare your current on-premises SQL Server’s performance to an Azure SQL Managed Instance’s performance:

In this post you will see some recommended tools and best practices that you should apply while doing performance comparison. The recommended performance comparison process has three stages:
1. Compare the environment settings on SQL Server and Managed Instance.
2. Create performance baseline on source SQL Server
3. Compare performance on Managed Instance with the baseline
In the following sections will be described the best practices and the recommended approaches

This is a good bit more involved than installing some product, clicking a few buttons, and comparing numbers.

Comments closed

Quantum: Serverless Database

Published 2019-06-11 by Kevin Feasel

Ashish Thusoo announces a new serverless SQL engine:

Today, we are launching Quantum, a high-performance serverless SQL engine, available on Qubole Data Platform, that simplifies SQL access by offering a true serverless deployment option to enable data analysts to query petabyte-scale volumes of data using ANSI-SQL.
Quantum allows teams to realize value from their data much more quickly, and because of its serverless nature, users pay only for queries they run. Data analysts can query object stores on AWS, Azure, Google Cloud, and Oracle Cloud in seconds to achieve faster time to market with far less IT management overhead.

Ankit Dixit, et al, provide a technical overview as well:

Existing serverless SQL service offerings do not provide users with the ability to use a metastore of their choice. With Quantum, data teams can use their own custom metastore and start using Quantum without recreating schemas or table metadata.
Most existing Qubole customers already use a custom metastore in the cloud. So there’s virtually no ramp up time to reap the benefits of Quantum.

The technical overview is a bit too much marketing for my tastes, but this is a move worth watching.

Comments closed

Arrays in Azure Data Factory

Published 2019-06-10 by Kevin Feasel

Rayis Imayev takes us through arrays in Azure Data Factory:

Currently, there are 3 data types supported in ADF variables: String, Boolean, and Array. The first two are pretty easy to use: Boolean for logical binary results and String for everything else, including the numbers (no wonder there are so many conversion functions in Azure Data Factory that we can use).
I’ve also blogged about using Variables in Azure Data Factory:
– Setting Variables in Azure Data Factory Pipelines
– Append Variable activity in Azure Data Factory: Story of combining things together
– System Variables in Azure Data Factory: Your Everyday Toolbox
– Azure Data Factory: Extracting array first element

Click through for arrays and follow up with those other posts from there.

Comments closed

Cosmos DB Role-Based Access Control

Published 2019-06-05 by Kevin Feasel

Hasan Savran takes us through role-based access control in Cosmos DB:

Role-based access control (RBAC) is available for Azure CosmosDB. By using RBAC, you can manage who has access to CosmosDB resources. You need to have a profile in Azure Active Directory to assign RBAC roles to users, groups or other targets. There are 4 built-in roles you can use. Microsoft announced the CosmosDB Operator
role recently.

Click through for that list.

Comments closed

Building an ARM Template for Azure Data Factory

Published 2019-06-05 by Kevin Feasel

Andy Leondard takes the first steps to building an Azure Data Factory pipeline using Azure Resource Manager Templates:

Azure Resource Manager, or ARM, “allows you to provision your applications using a declarative template.” So says the Azure Quickstart Templates page. ARM templates are JSON and allow administrators to import and export Azure resources using varying management patterns. I really like ARM templates for implementing infrastructure as code in Azure. In this post I show a very simple example of how to use ARM templates to export and then import a basic ADF (Azure Data Factory) pipeline.

The sample code doesn’t do that much by itself, but it does open up a new world of automation.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Category: Cloud