Cloud – Page 81 – Curated SQL

From Azure Data Factory to Synapse Pipelines

Published 2021-09-24 by Kevin Feasel

In this post I want to share an alternative way to copy an Azure Data Factory pipeline to Synapse Studio. Because I think it can be useful.
For those who are not aware, Synapse Studio is the frontend that comes with Azure Synapse Analytics. You can find out more about it in another post I did, which was a five minute crash course about Synapse Studio.
By the end of this post, you will know one way to copy objects used for an Azure Data factory pipeline to Synapse Studio. Which works as long as both are configured to use Git.

Click through to see how.

Comments closed

Automating Notebook Execution with Powershell

Published 2021-09-23 by Kevin Feasel

Julie Koesmarno shows off an automation process for notebooks:

When I first think about automation, I generally think in the following way: in order to automate a script, we want to ensure that the script itself can be run via a command line interface (CLI) and with almost no user interaction (except for input and output parameters). Now, how do we apply this to Jupyter Notebooks so that we can automate SQL notebooks or PowerShell Notebooks?
The good news is that these SQL notebooks and PowerShell notebooks that we’ve created using Azure Data Studio, can be run on PowerShell CLI. If these notebooks can be run on PowerShell CLI, that means any automation systems or serverless architecture (Azure Automation combined with Azure Logic Apps as an example) should be able to run these notebooks also.
In this blog post, I’ll cover examples on using Invoke-SqlNotebook, using Invoke-ExecuteNotebook and putting it together with Azure Automation.

Click through to see the whole thing.

Comments closed

Ensemble Classification in Azure Machine Learning

Published 2021-09-22 by Kevin Feasel

Dinesh Asanka reminds me not to use the designer for tough Azure ML problems:

Let us see how we can extend the standard classification to Ensemble Classifiers in Azure Machine Learning. Before we discuss the details of this configuration, you can view or download the experiment from Ensemble Classification
The following figure shows the complex layout of the Ensemble Classifiers in Azure Machine Learning.

Dinesh is not kidding about that complexity. This is definitely a use case for the Azure ML SDK.

Comments closed

Monitoring Azure Data Factory, Integration Runtimes, and Pipelines

Published 2021-09-16 by Kevin Feasel

Sandeep Arora monitors all the things:

For effective monitoring of ADF pipelines, we are going to use Log Analytics, Azure Monitor and Azure Data Factory Analytics. The above illustration shows the architectural representation of the monitoring setup.
The details of setting up log analytics, alerts and Azure Data Factory Analytics are further discussed in this section.

If you manage Azure Data Factory in your environment, give this a read.

Comments closed

Role-Based Access Control in Snowflake

Published 2021-09-15 by Kevin Feasel

Warner Chaves explains how role-based access controls work in Snowflake:

The data access privilege granularity is the lowest level of securable that you will use to provide data access. This can theoretically go all the way down to rows and all the way up to full databases.
I usually recommend that people start out with using Schema as their data access securable granularity. Database is usually too broad and you will inevitably have to re-do your roles and table level. Below is too specific to turn it into a general methodology—you would end up with way too many roles. See the FAQ later in this post on how to mix and match granularities if needed.
Once you have the granularity defined, you then create back-end roles at that level.

Read on to see what those roles look like. It’s a pretty standard RBAC setup.

Comments closed

Azure DevOps Templates for Data Platform Deployments

Published 2021-09-15 by Kevin Feasel

Kevin Chant has some toys for us:

For my T-SQL Tuesday contribution this month is I want to introduce my Azure DevOps templates for Data Platform deployments.
This months T-SQL Tuesday is hosted by Frank Geisler. Frank has invited us to write about deploying SQL components through descriptive methods and build some new cool templates for them.
Which is good timing for me, because I co-presented a session on the day this post is published. I showed how to use YAML in Azure DevOps for Data Platform deployments at Data Platform Virtual Summit. .

Click through to learn more and see Kevin’s repos, as well as more information on the topic.

Comments closed

Environments in Azure ML

Published 2021-09-14 by Kevin Feasel

Luis Valencia explains what environments are in Azure ML:

An Environment defines Python packages, environment variables, and Docker settings that are used in machine learning experiments, including in data preparation, training, and deployment to a web service. An Environment is managed and versioned in an Azure Machine Learning Workspace. You can update an existing environment and retrieve a version to reuse. Environments are exclusive to the workspace they are created in and can’t be used across different workspaces.
In basic terms for a developer, it’s basically a Docker Image with all the needed dependencies (conda/pip packages) to run your experiment.

A friendly word of advice from some bad experiences: stick with the curated environments as much as you can. Those are easy and rarely fail. Building your own environments from Conda files is a possibility, but it’s an, err, probabilistic exercise as to whether your compute target will actually work or not.

Comments closed

AWS EC2 I3 Instance Types and Storage Persistence

Published 2021-09-13 by Kevin Feasel

Steve REzhener has a warning for us:

Amazon Web Services Elastic Cloud Computing (a.k.a. EC2) is a service that lets anyone with a credit card rent a virtualized server from Amazon. To cater to different clients’ needs, AWS provides various instance types that are either general instance or specific-purpose instances (focused on CPU, RAM, IO). You can see the different types in Fig 1. This blog post is going to talk about a storage optimized instance. the I3 instance type family, its little-known problem, and the solution in the form of Elastic Block Storage (a.k.a. EBS).

Click through for the warning, more explanation, and what you can do about it. H/T the SQLServerCentral newsletter.

Comments closed

Patched Security Flaw in Azure Container Instances

Published 2021-09-10 by Kevin Feasel

Ionut Ilascu reports on a vulnerability:

Microsoft has fixed a vulnerability in Azure Container Instances called Azurescape that allowed a malicious container to take over containers belonging to other customers on the platform.
An adversary exploiting Azurescape could execute commands in the other users’ containers and gain access to all their data deployed to the platform, the researchers say.

This is fixed now, but it’s a good reminder that platform-as-a-service offerings can still have security problems (as we’ve also seen recently with Power Apps and Cosmos DB).

Comments closed

Databricks Serverless SQL

Published 2021-09-09 by Kevin Feasel

Nikhil Jethava and Kevin Clugage announce serverless SQL on Databricks:

Databricks SQL already provides a first-class user experience for BI and SQL directly on the data lake, and today, we are excited to announce another step in making data and AI simple with Databricks Serverless SQL. This new capability for Databricks SQL provides instant compute to users for their BI and SQL workloads, with minimal management required and capacity optimizations that can lower overall cost by an average of 40%. This makes it even easier for organizations to expand adoption of the lakehouse for business analysts who are looking to access the rich, real-time datasets of the lakehouse with a simple and performant solution.
Under the hood of this capability is an active server fleet, fully managed by Databricks, that can transfer compute capacity to user queries, typically in about 15 seconds. The best part? You only pay for Serverless SQL when users start running reports or queries.

Things are getting interesting between Databricks and Azure Synapse Analytics, as both now have serverless SQL and Spark offerings. Synapse Analytics has the better implementation for serverless SQL and Databricks the superior Spark implementation, so it becomes a question of which weakness you take in order to gain the strength.

1 Comment

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Category: Cloud