Press "Enter" to skip to content

Category: Cloud

Migrating to Azure with SQL Server Management Studio

Magi Naumova walks us through some options for migrating on-prem instances to Azure, all of which are available in SQL Server Management Studio:

The cases of migrating our database in Azure become more and more every day. Azure SQL Database is the flagship SaaS service Microsoft Provides for hosting a relational database. But no matter it is the same engine there are still many features not supported or with limited functionalities in Azure SQL DB comparing to on premises SQL Server versions. For example, all cross-database references are possible in on premises SQL Server databases but is not supported in Azure SQL Database.

If we could check in advance and plan our migration based on those checks it would be time and effort saving. This is what Migrate to Azure new SSMS features are built for.

Click through for the options, some of which are simply informational and some of which actually do the work.

Comments closed

Metadata Integrity Checks in ADF.ProcFwk

Paul Andrew has another update to the ADF metadata-driven processing framework:

With this release of the framework I wanted to take the opportunity to harden the database and add some more integrity (intelligence) to the metadata, things that go beyond the existing database PK/FK constraints. After all, this metadata drives everything that Azure Data Factory does/is about to do – so it needs to be correct. These new integrity checks take two main forms:

1. Establishing a minimum set of criteria within the metadata before the core Data Factory processing starts and creates an execution run.
2. Establishing a logical chain of pipeline dependencies across processing stages. Then providing a set of advisory checks for area’s of conflict and/or improvement.

More details on both are included against the actual stored procedure in the database changes section below.

In addition to database hardening, I’ve added a few other bits to the solution, including a PowerShell script for ADF deployments and a Data Studio Notebook to make the developer experience of implementing this code project a little nicer.

Read on to see what’s in version 1.3. Check it out on GitHub as well.

Comments closed

Hive + LLAP Now Faster with ElasticMapReduce 6

Suthan Phillips has a benchmark for ElasticMapReduce 5 versus 6:

To evaluate the performance benefits of running Hive with Amazon EMR release 6.0.0, we’re using 70 TCP-DS queries with a 3 TB Apache Parquet dataset on a six-node c4.8xlarge EMR cluster to compare the total runtime and geometric mean with results from EMR release 5.29.0.

The results show that the TPC-DS queries run twice as fast in Amazon EMR 6.0.0 (Hive 3.1.2) compared to Amazon EMR 5.29.0 (Hive 2.3.6) with the default Amazon EMR Hive configuration.

The following graph shows performance improvements measured as total runtime for 70 TPC-DS queries. Amazon EMR 6.0.0 has the better (lower) runtime.

Click through for the measures and a bit more info on LLAP.

Comments closed

Checking JSON Structure with ADF

Rayis Imayev takes us through the solution of a tricky problem in Azure Data Factory:

Within my “ForEach” container I have also placed a Stored Procedure task and set 4 data elements from my incoming data stream as values for corresponding parameters.

However this approach will not work for all my incoming JSON events, it actually failed for the last one, since it didn’t have both “stop_time” and “last_update” data elements.

An easy way to fix this problem is to add missing data elements with empty values for the last event record, however, when we don’t have control over incoming data, we need to adjust our data processing steps.

Read on to see how Rayis solves this problem.

Comments closed

Tying Azure Data Factory to Source Control

Eddy Djaja explains why you really want to tie Azure Data Factory to your source control:

Azure Data Factory (ADF) is Microsoft’s ETL or more precise: ELT tool in the cloud. For more information of ADF, Microsoft puts the introduction of ADF in this link: https://docs.microsoft.com/en-us/azure/data-factory/introduction. As some have argued if ADF will replace or complement the “on-premise”  SSIS, it is uncertain and only time can tell what will happen in the future.
Unlike SSIS, the authoring of ADF does not use Visual Studio. ADF authoring uses a web browser to create ADF components, such as pipelines, activities, datasets, etc. The simplicity of authoring ADF may confuse the novice developers on how ADF components are saved, stored and published. When logging to ADF for the first time after creating an ADF, the authoring is in the ADF mode. How do we know?

Click through for the explanation and some resources on how to do it.

Comments closed

Auto-Detecting Column Delimiters with Data Factory

Mark Kromer shows us a way of dynamically learning what the likely delimiter of a delimited file is:

Processing delimited text files in the data lake is one of the most popular uses of Azure Data Factory (ADF). To define the field delimiter, you set the column delimiter property in an ADF dataset.

The reality of data processing is that delimiter can change often. ADF provides a facility to account for this data drift via parameterization. However, this assumes that you know that the delimiter is changing and what it will change to.

I’m going to briefly describe a sample of how to auto-detect a file delimiter using ADF Data Flows.

Click through for the demo.

Comments closed

Azure AD Passthrough and Password Hash Authentication in SQL DB, DW, MI

Mirek Sztajno announces two new security pieces for Azure SQL Database, Azure Synapse Analytics, and Azure SQL Managed Instances:

We are announcing support for Azure AD pass-through and password hash authentication for Azure SQL DB (single database and database pools), Managed Instance, and Azure Synapse (formerly SQL DW).

Azure AD password hash authentication is the simplest way to enable authentication for on-premises Active Directory users in Azure AD. Users are synchronized with Azure AD and password validation occurs in the cloud using the same username and password that is used in on-premises environments. No additional infrastructure is required.

Azure AD pass-through authentication provides a password validation mechanism that validate users directly with on-premises Active Directory, outside the cloud. Pass-through authentication does not require ADFS or other third-party federation services.

– Each of these authentication methods can be configured by Azure AD Connect, allowing you to provision users in the cloud.

Read on to see what this means for you.

Comments closed

Diving into Kubernetes: a Workshop

Chris Adkin has been busy:

I have not blogged for a while, it was my hope to produce part 5 in the series of creating a Kubernetes cluster for production grade Big Data Clusters. However, there is a very good reason for this, and that is because I have been working on a one day workshop to be delivered at SQL Bits in September, the material can be found here, enjoy !

I’ve only looked at the module listings, but Chris does a great job putting long-form articles together, so I’ve already added it to my todos.

Comments closed