Press "Enter" to skip to content

Category: Cloud

Tying Azure Data Factory to Source Control

Eddy Djaja explains why you really want to tie Azure Data Factory to your source control:

Azure Data Factory (ADF) is Microsoft’s ETL or more precise: ELT tool in the cloud. For more information of ADF, Microsoft puts the introduction of ADF in this link: https://docs.microsoft.com/en-us/azure/data-factory/introduction. As some have argued if ADF will replace or complement the “on-premise”  SSIS, it is uncertain and only time can tell what will happen in the future.
Unlike SSIS, the authoring of ADF does not use Visual Studio. ADF authoring uses a web browser to create ADF components, such as pipelines, activities, datasets, etc. The simplicity of authoring ADF may confuse the novice developers on how ADF components are saved, stored and published. When logging to ADF for the first time after creating an ADF, the authoring is in the ADF mode. How do we know?

Click through for the explanation and some resources on how to do it.

Comments closed

Auto-Detecting Column Delimiters with Data Factory

Mark Kromer shows us a way of dynamically learning what the likely delimiter of a delimited file is:

Processing delimited text files in the data lake is one of the most popular uses of Azure Data Factory (ADF). To define the field delimiter, you set the column delimiter property in an ADF dataset.

The reality of data processing is that delimiter can change often. ADF provides a facility to account for this data drift via parameterization. However, this assumes that you know that the delimiter is changing and what it will change to.

I’m going to briefly describe a sample of how to auto-detect a file delimiter using ADF Data Flows.

Click through for the demo.

Comments closed

Azure AD Passthrough and Password Hash Authentication in SQL DB, DW, MI

Mirek Sztajno announces two new security pieces for Azure SQL Database, Azure Synapse Analytics, and Azure SQL Managed Instances:

We are announcing support for Azure AD pass-through and password hash authentication for Azure SQL DB (single database and database pools), Managed Instance, and Azure Synapse (formerly SQL DW).

Azure AD password hash authentication is the simplest way to enable authentication for on-premises Active Directory users in Azure AD. Users are synchronized with Azure AD and password validation occurs in the cloud using the same username and password that is used in on-premises environments. No additional infrastructure is required.

Azure AD pass-through authentication provides a password validation mechanism that validate users directly with on-premises Active Directory, outside the cloud. Pass-through authentication does not require ADFS or other third-party federation services.

– Each of these authentication methods can be configured by Azure AD Connect, allowing you to provision users in the cloud.

Read on to see what this means for you.

Comments closed

Diving into Kubernetes: a Workshop

Chris Adkin has been busy:

I have not blogged for a while, it was my hope to produce part 5 in the series of creating a Kubernetes cluster for production grade Big Data Clusters. However, there is a very good reason for this, and that is because I have been working on a one day workshop to be delivered at SQL Bits in September, the material can be found here, enjoy !

I’ve only looked at the module listings, but Chris does a great job putting long-form articles together, so I’ve already added it to my todos.

Comments closed

Removing Old Backups from Azure Blob Storage

Niko Neugebauer has some advice on how to clean up backups which live in Azure Blob Storage:

Continuing the topic of the Backups to Azure Blob Storage that I have kind of kicked off with the post Striping Backups to Azure Blob Storage, I want to touch on the important aspect of “keeping it clean” – thus deleting the old backups.
On the regular Windows Server this is a rather easy task, and if you are using a standard maintenance solution, such as Ola Hallengren’s Maintenance Solution or any other one. You can also use the regular SSMS maintenance (*cough* for whatever reason that is unknown to me, that you might wish to *cough*), or you can easily set up a regular Windows Scheduler with Command Line Batch or Powershell or whatever tool/script/language you like.

The situation is quite different with the Backup To URL functionality, the one that is available since more than 6 years (and the good old SQL Server 2012 has even got a support for it in a certain Cumulative Update – SQL Server 2012 Service Pack 1 CU 2, to be more prices)

Niko goes through five different methods you can use, so check it out.

Comments closed

Optimistic Concurrency in Cosmos DB

Hasan Savran takes us through optimistic concurrency with Cosmos DB:

To handle this problem, usually developers use a column like LastUpdateDt. You bring this column to Frontend and post it back to database with updated model. If LastUpdateDt hasn’t changed updates goes into database. If LastUpdateDt is changed, that means somebody else updated this model and your code rejects changes.

     In this post, I will try to answer questions like “ How do we do this in Azure Cosmos DB?”“Do I need to do all that logic manually by using Cosmos DB SDK? “ I will use Cosmos DB’s REST API to demo how Cosmos DB handles Optimistic Concurrency Control automatically. If you have experience developing any REST API, you might be familiar with headers like If-Match or If-None-Match. These HTTP headers controls what should be updated or not. Also, you can use them for caching too since they check which item is updated or not, you may need to cache items until they are changed.

Hasan has a demo for us as well, so check it out.

Comments closed

Transforming JSON to CSV with Azure Data Factory

Rayis Imayev shows how to use the Flatten task with Azure Data Factory wrangling data flows:

Last week I blogged about using Mapping Data Flows to flatten sourcing JSON file into a flat CSV dataset:
Part 1Transforming JSON to CSV with the help of Flatten task in Azure Data Factory

Today I would like to explore the capabilities of the Wrangling Data Flows in ADF to flatten the very same sourcing JSON dataset.

Click through to see what’s different.

Comments closed

Deleting Old Build Definitions in Azure DevOps

Mark Broadbent solves a problem for us:

I have been experiencing a problem for quite a while now in my current environment, in that some of our old builds cannot be deleted. When you attempt to do so it results in the following error:

One or more builds associated with the requested pipeline(s) are retained by a release. The pipeline(s) and builds will not be deleted.

Many of our pipelines have undergone a lot of change over time to the degree it is not even obvious anymore why (or indeed where) these builds are being prevented from being dropped. The only thing that is clear is that until they can be, the old build definitions will remain.

Regardless of the reason why, Mark has the answer for how.

Comments closed

Using Azure Key Vault with Azure Databricks

Jason Bonello shows how easy it is to integrate Azure Key Vault into Azure Databricks:

In Azure Key Vault we will be adding secrets that we will be calling through Azure Databricks within the notebooks. First and foremost, this is for security purposes. It will ensure usernames and passwords are not hardcoded within the notebook cells and offer some type of control over access in case it needs to be reverted later on (assuming it is controlled by a different administrator). In addition to this, it will offer a better way of maintaining a solution, since if a password ever needs to be changed, it will only be changed in the Azure Key Vault without the need to go through any notebooks or logic.

If you don’t use Key Vault, Databricks does include its own secrets storage, so there’s really no reason to keep them in plaintext.

1 Comment