Press "Enter" to skip to content

Category: Cloud

Parameterizing a Data Factory Linked Service to a REST API

Meagan Longoria had to parameterize a linked service connecting to a REST API recently:

In order to pass dynamic values to a linked service, we need to parameterize the linked service, the dataset, and the activity.

I have a pipeline where I log the pipeline start to a database with a stored procedure, lookup a username in Key Vault, copy data from a REST API to data lake storage, and log the end of the pipeline with a stored procedure. My username and password are stored in separate secrets in Key Vault, so I had to do a lookup with a web activity to get the username. The password is retrieved using Key Vault inside the linked service. Data Factory doesn’t currently support retrieving the username from Key Vault so I had to roll my own Key Vault lookup there.

Click through for the instructions.

Comments closed

Moving Data Around in Azure Synapse Analytics

Niko Neugebauer looks at some techniques for copying data into a table in an Azure Synapse Analytics SQL Pool:

First of all, let us list some of them (and I am not even attempting on providing all of them, of course):
– INSERT INTO … SELECT FROM … (the most well known one)
– SELECT INTO … FROM … (the most well-known to perform well, since it will create a HEAP while copying most of the properties from the original table(s))
– CREATE TABLE … AS SELECT … (the old way, which must be like 10 years old on PDW/APS & Azure SQL DW, but that has never gotten into a Box Product or Azure SQL Database)
– Polybase (that will use the External Tables & externally allocated data to transfer into Azure SQL DW)
– BCP (good old tested friend that will give you a pain in the neck until you dominate it)
– OPENROWSET / BULK INSERT (some very good and very old friends with complicated histories (who remembers all the code pages?, settings and uncertain future mostly because of their original restrictions, I guess)
– COPY INTO … (the brand new command that will allow you under very neat privileges to copy data from the external storage accounts, much like BULK INSERT)

In this blog post I will simply focus on those features that have not been ported (hopefully just yet): CTAS & COPY INTO.

Read on to see how these two work. Also, I too have wanted CTAS in on-premises SQL Server for years.

Comments closed

ACID Transactions with Cosmos DB

Hasan Savran shows how you can use the Cosmos DB SDK to create ACID transactions:

What about Azure Cosmos DB? It’s a NoSQL database, probably you can’t do ACID Transactions right? WRONG! Azure Cosmos DB has been supporting ACID transaction for some time now. We were able to create ACID transactions by using stored procedures of Cosmos DB. Last year (2019) Cosmos DB team introduced ACID transactions to Cosmos DB SDK. Now, we can create transactions by using C# just like writing transactions by using SQLClient class for SQL Server!

      To create an ACID transaction in Cosmos DB SDK, we need to use TransactionalBatch object. You need add all operations in transaction to TransactionalBatch object. All the operations attached to the TransactionalBatch object must share the same partition key. In the following example, I created three objects and attach them to TransactionalBatch object. To start the transaction, I ran the ExecuteAsync() function.  This function runs the transaction and returns the responses for each operation.

I’d think you would need to set a strong consistency level as well.

Comments closed

Managed Identity with Azure Functions

Taiob Ali shows how you can safely store credentials which your Azure Function apps need:

With the announcement of Powershell support in Azure Functions, it has become easier for data professionals to use functions to manage cloud resources such as Azure SQL Database, Managed Instances. A common challenge when using functions is how to manage the credentials in function code for authenticating databases. Keeping the credentials secure is an important task. Ideally, the credentials should never appear in the code or in the source control.

Manged Identity can solve this problem as Azure SQL Database and Managed Instance both support Azure AD authentication. You can read mode about Managed Identity here.

In this article, I will show how to set up Azure Function App to use Managed Identity to authenticate functions against Azure SQL Database.

The example connects Azure SQL DB, but this is a general-purpose solution.

Comments closed

Improving Join Performance on ADF Data Flows

Mark Kromer has a few tips on improving ADF data flow join performance:

When you include literal values in your join conditions, Spark may see that as a requirement to perform a full cartesian product first, then filter out the joined values. But if you ensure that you (1) have column values from both sides of your join condition, you can avoid this Spark-induced cartesian product and improve the performance of your joins and data flows. (2) Avoid use of literal conditions to represent the results of one side of your join condition.

In other words, avoid this for your join condition:source1@movieId == '1'Instead, implement that with a dummy derived column. 

There are several good tips in this post.

Comments closed

Migrating Oracle Exadata Workloads to Azure

Kellyn Pot’vin-Gorman shows the process of moving from an Exadata system to Oracle on Azure:

An Exadata is an engineered system-  database nodes, secondary cell nodes, (also referred to as storage nodes/cell disks), InfiniBand for fast network connectivity between the nodes, specialized cache, along with software features such as Real Application Clusters, (RAC), hybrid columnar compression, (HCC), storage indexes, (indexes in memory) offloading technology that has logic built into it to move object scans and other intensive workloads to cell nodes from the primary database nodes.  There are considerable other features, but understanding that Exadata is an ENGINEERED system, not a hardware solution is important and its both a blessing and a curse for those databases supported by one.  The database engineer must understand both Exadata architecture and software along with database administration.  There is an added tier of performance knowledge, monitoring and patching that is involved, including knowledge of the Cell CLI, the command line interface for the cell nodes.  I could go on for hours on more details, but let’s get down to what is required when I am working on a project to migrate an Exadata to Azure.

Click through for the process.

Comments closed

Using Koalas on Azure Databricks

Ginger Grant shows how you can install the koalas library on an Azure Databricks cluster:

Unfortunately if you are using an ML workspace, this will not work and you will get the error message org.apache.spark.SparkException: Library utilities are not available on Databricks Runtime for Machine Learning. The Koalas github documentation  says “In the future, we will package Koalas out-of-the-box in both the regular Databricks Runtime and Databricks Runtime for Machine Learning”.  What this means is if you want to use it now

Most of the time I want to install on the whole cluster as I segment libraries by cluster.  This way if I want those libraries I just connect to the cluster that has them. Now the easiest way to install a library is to open up a running Databricks cluster (start it if it is not running) then go to the Libraries tab at the top of the screen.

Click through for a demo of what you need to do.

Comments closed

Using ACLs to Secure Azure Data Lake Data

Matthew Roche takes us through access control lists (ACLs) in Azure Data Lake Storage Gen2 and how they apply to Power BI:

Earlier this week I received a question from a customer on how to get Power BI to work with data in ADLSg2 that is  secured using ACLs. I didn’t know the answer, but I knew who would know, and I looped in Ben Sack from the dataflows team.Ben answered the customer’s questions and unblocked their efforts, and he said that I could turn them into a blog post. Thank you, Ben!

Read on for the answer.

Comments closed

Tips for Using Azure Storage

James Serra takes us through Azure Data Lake Store Gen2 and Azure Blob Storage:

Azure Data Lake Store (ADLS) Gen2 should be used instead of Azure Blob Storage unless there is a needed feature that is not yet GA’d in ADLS Gen2.

The major features that are missing from ADLS Gen2 are premium tiersoft deletepage blobsappend blobs, and snapshots. The major features that are in preview are archive tierlifecycle management, and diagnostic logs. Check out all the missing features at Known issues with Azure Data Lake Storage Gen2.

Note that underneath the covers, ADLS Gen2 uses Azure Blob Storage and is simply a layer over blob storage providing additional features (i.e. hierarchical file system, better performance, enhanced security, Hadoop compatible access).

Click through for a bullet point list of useful information.

Comments closed

Managing Systems with Azure Arc

Robert Smit takes us through Azure Arc:

This Blog post is about Azure Arc, how to set this up and get you started with Azure Arc. For customers who want to simplify complex and distributed environments across on-premises, edge and multi cloud, Azure Arc enables deployment of Azure services anywhere and extends Azure management to any infrastructure.

So Azure Arc is not a replacement for the old Azure Server manager tools! So no remote RDP or open MMC only log analytics, policy’s, CLI etc. https://robertsmit.wordpress.com/2016/08/25/azure-server-management-tools-manage-your-servers-from-anywhere-servermgmt-azure-smt/

Click through for a demonstration.

Comments closed