Cloud – Page 128 – Curated SQL

With the announcement of Powershell support in Azure Functions, it has become easier for data professionals to use functions to manage cloud resources such as Azure SQL Database, Managed Instances. A common challenge when using functions is how to manage the credentials in function code for authenticating databases. Keeping the credentials secure is an important task. Ideally, the credentials should never appear in the code or in the source control.
Manged Identity can solve this problem as Azure SQL Database and Managed Instance both support Azure AD authentication. You can read mode about Managed Identity here.
In this article, I will show how to set up Azure Function App to use Managed Identity to authenticate functions against Azure SQL Database.

The example connects Azure SQL DB, but this is a general-purpose solution.

Comments closed

Improving Join Performance on ADF Data Flows

Published 2020-01-27 by Kevin Feasel

Mark Kromer has a few tips on improving ADF data flow join performance:

When you include literal values in your join conditions, Spark may see that as a requirement to perform a full cartesian product first, then filter out the joined values. But if you ensure that you (1) have column values from both sides of your join condition, you can avoid this Spark-induced cartesian product and improve the performance of your joins and data flows. (2) Avoid use of literal conditions to represent the results of one side of your join condition.
In other words, avoid this for your join condition:source1@movieId == '1'Instead, implement that with a dummy derived column.

There are several good tips in this post.

Comments closed

Migrating Oracle Exadata Workloads to Azure

Published 2020-01-23 by Kevin Feasel

Kellyn Pot’vin-Gorman shows the process of moving from an Exadata system to Oracle on Azure:

An Exadata is an engineered system- database nodes, secondary cell nodes, (also referred to as storage nodes/cell disks), InfiniBand for fast network connectivity between the nodes, specialized cache, along with software features such as Real Application Clusters, (RAC), hybrid columnar compression, (HCC), storage indexes, (indexes in memory) offloading technology that has logic built into it to move object scans and other intensive workloads to cell nodes from the primary database nodes. There are considerable other features, but understanding that Exadata is an ENGINEERED system, not a hardware solution is important and its both a blessing and a curse for those databases supported by one. The database engineer must understand both Exadata architecture and software along with database administration. There is an added tier of performance knowledge, monitoring and patching that is involved, including knowledge of the Cell CLI, the command line interface for the cell nodes. I could go on for hours on more details, but let’s get down to what is required when I am working on a project to migrate an Exadata to Azure.

Click through for the process.

Comments closed

Using Koalas on Azure Databricks

Published 2020-01-21 by Kevin Feasel

Ginger Grant shows how you can install the koalas library on an Azure Databricks cluster:

Unfortunately if you are using an ML workspace, this will not work and you will get the error message org.apache.spark.SparkException: Library utilities are not available on Databricks Runtime for Machine Learning. The Koalas github documentation says “In the future, we will package Koalas out-of-the-box in both the regular Databricks Runtime and Databricks Runtime for Machine Learning”. What this means is if you want to use it now
Most of the time I want to install on the whole cluster as I segment libraries by cluster. This way if I want those libraries I just connect to the cluster that has them. Now the easiest way to install a library is to open up a running Databricks cluster (start it if it is not running) then go to the Libraries tab at the top of the screen.

Click through for a demo of what you need to do.

Comments closed

Using ACLs to Secure Azure Data Lake Data

Published 2020-01-21 by Kevin Feasel

Matthew Roche takes us through access control lists (ACLs) in Azure Data Lake Storage Gen2 and how they apply to Power BI:

Earlier this week I received a question from a customer on how to get Power BI to work with data in ADLSg2 that is secured using ACLs. I didn’t know the answer, but I knew who would know, and I looped in Ben Sack from the dataflows team.Ben answered the customer’s questions and unblocked their efforts, and he said that I could turn them into a blog post. Thank you, Ben!

Read on for the answer.

Comments closed

Tips for Using Azure Storage

Published 2020-01-20 by Kevin Feasel

James Serra takes us through Azure Data Lake Store Gen2 and Azure Blob Storage:

Azure Data Lake Store (ADLS) Gen2 should be used instead of Azure Blob Storage unless there is a needed feature that is not yet GA’d in ADLS Gen2.
The major features that are missing from ADLS Gen2 are premium tier, soft delete, page blobs, append blobs, and snapshots. The major features that are in preview are archive tier, lifecycle management, and diagnostic logs. Check out all the missing features at Known issues with Azure Data Lake Storage Gen2.
Note that underneath the covers, ADLS Gen2 uses Azure Blob Storage and is simply a layer over blob storage providing additional features (i.e. hierarchical file system, better performance, enhanced security, Hadoop compatible access).

Click through for a bullet point list of useful information.

Comments closed

Managing Systems with Azure Arc

Published 2020-01-17 by Kevin Feasel

Robert Smit takes us through Azure Arc:

This Blog post is about Azure Arc, how to set this up and get you started with Azure Arc. For customers who want to simplify complex and distributed environments across on-premises, edge and multi cloud, Azure Arc enables deployment of Azure services anywhere and extends Azure management to any infrastructure.
So Azure Arc is not a replacement for the old Azure Server manager tools! So no remote RDP or open MMC only log analytics, policy’s, CLI etc. https://robertsmit.wordpress.com/2016/08/25/azure-server-management-tools-manage-your-servers-from-anywhere-servermgmt-azure-smt/

Click through for a demonstration.

Comments closed

Azure SQL Database Edge in Public Preview

Published 2020-01-15 by Kevin Feasel

Amit Banerjee announces the public preview for Azure SQL Database Edge:

Azure SQL Database Edge is available in public preview. Azure SQL Database Edge runs on ARM and Intel architecture and brings the most secure Microsoft SQL engine to the edge. By running the same Microsoft SQL database engine both on-premises and in the cloud, you now only need to develop your applications once and deploy anywhere across the edge, your datacenter, and Azure.
With the availability of Azure SQL Database Edge in public preview, we’re inviting customers, partners, and ISVs to join the early adopter program to experience the power of SQL and AI on the edge.

I’m interested in checking out more of these time series capabilities.

Comments closed

Loading Event Hubs from Cosmos DB

Published 2020-01-14 by Kevin Feasel

Annie Xu shows us how we can use Azure Functions to take data from Cosmos DB and populate Event Hubs:

One way to load data from Cosmos DB to Event hub is to use Azure Function. But although there is many coding samples out there to create such Azure Function. If you are like me do not have much application development experience, reading those code samples is bit channenging. Luckly, Azure Portal made is so easy.

Annie has a step-by-step walkthrough which makes it easy.

Comments closed

Don’t Miss These Settings in Azure SQL DB

Published 2020-01-14 by Kevin Feasel

Arun Sirpal takes us through a few things administrators tend to miss in Azure SQL Database:

2. Allow Azure Services and resources to access this server setting set to on/off?
I always set this to off. I do not like it ON.
Why? Because I like to control things via vnets (maybe IPs if really needed – it depends on your solution). Nowadays you can use private endpoint connections which allow connections from within a vnet to a private IP. Sure, you may want to use IP addresses, if you do then I suggest database level firewall rules over server level, especially if you use failover groups.

There are several good ones here.

Comments closed

Category: Cloud

Managed Identity with Azure Functions

Improving Join Performance on ADF Data Flows

Migrating Oracle Exadata Workloads to Azure

Using Koalas on Azure Databricks

Using ACLs to Secure Azure Data Lake Data

Tips for Using Azure Storage

Managing Systems with Azure Arc

Azure SQL Database Edge in Public Preview

Loading Event Hubs from Cosmos DB

Don’t Miss These Settings in Azure SQL DB