Cloud – Page 117 – Curated SQL

Creating a data solution with Azure Data Factory (ADF) may look like a straightforward process: you have incoming datasets, business rules of how to connect and change them and a final destination environment to save this transformed data. Very often your data transformation may require more complex business logic that can only be developed externally (scripts, functions, web-services, databricks notebooks, etc.).

In this blog post, I will try to share my experience of using Azure Functions in my Data Factory workflows: my highs and lows of using them, my victories and struggles to make them work.

This includes a description of the options, a demo function, and additional notes for each technique.

Comments closed

Inserting Geospatial Data into Cosmos DB

Published 2020-04-16 by Kevin Feasel

Hasan Savran shows us how we can use the .NET SDK to insert geospatial data into Cosmos DB:

GeoSpatial Data can help you to answer many questions in your business If you know how to use Spatial data. Searching data by radius can bring you all kind of interesting data. For example, If you know the path of hurricane, you can make searches by using the path and find all your customers under that path. Then you can be proactive and do something about this upcoming problem for your business.
Many databases support Spatial Data Types, I will cover how to store Spatial Data in Azure Cosmos DB in this post. I have an earlier post about how to import Spatial Data into Azure Cosmos DB by using Data Migration Tool. I will focus on how to store spatial data by using .NET SDK in this post. I used .NET SDK 3.8.0, you can get the latest SDK from here..

Click through for a demonstration.

Comments closed

CHECKDB Matters in the Cloud Too

Published 2020-04-16 by Kevin Feasel

Daniel Janik takes us through an ordeal related to CHECKDB on an Azure Managed Instance:

This is crazy! What now? Open at ticket with MSFT? This seemed the only choice and what was the root cause? Apparently in Azure Managed Instances, Microsoft will check databases for corruption and will take the database offline if detected.
When in this special offline state there’s no way to access the database and Microsoft must be contacted. You can’t set the DB in recovery mode or change it to ONLINE. Microsoft does “contact” someone to notify that the database was taken offline due to corruption but if you work at a larger company this notification may never reach the right people.

Read on to see what Daniel ended up doing and some tips on making the process smoother.

Comments closed

Clarifying Nomenclature around Azure Synapse Analytics

Published 2020-04-14 by Kevin Feasel

James Serra clears a few things up:

I see a lot of confusion among many people on what features are available today in Azure Synapse Analytics (formally called Azure SQL Data Warehouse) and what features are coming in the future. Below is a picture (click to zoom) that I describe below that hopefully clears things up:

I tend to just say “Azure Synapse Analytics SQL Pools” for the product formerly known as Azure SQL Data Warehouse and save “Azure Synapse Analytics” to include Spark + hyperscale (James’s v3).

Comments closed

Migrating to Azure with SQL Server Management Studio

Published 2020-04-13 by Kevin Feasel

Magi Naumova walks us through some options for migrating on-prem instances to Azure, all of which are available in SQL Server Management Studio:

The cases of migrating our database in Azure become more and more every day. Azure SQL Database is the flagship SaaS service Microsoft Provides for hosting a relational database. But no matter it is the same engine there are still many features not supported or with limited functionalities in Azure SQL DB comparing to on premises SQL Server versions. For example, all cross-database references are possible in on premises SQL Server databases but is not supported in Azure SQL Database.
If we could check in advance and plan our migration based on those checks it would be time and effort saving. This is what Migrate to Azure new SSMS features are built for.

Click through for the options, some of which are simply informational and some of which actually do the work.

Comments closed

Metadata Integrity Checks in ADF.ProcFwk

Published 2020-04-10 by Kevin Feasel

Paul Andrew has another update to the ADF metadata-driven processing framework:

With this release of the framework I wanted to take the opportunity to harden the database and add some more integrity (intelligence) to the metadata, things that go beyond the existing database PK/FK constraints. After all, this metadata drives everything that Azure Data Factory does/is about to do – so it needs to be correct. These new integrity checks take two main forms:
1. Establishing a minimum set of criteria within the metadata before the core Data Factory processing starts and creates an execution run.
2. Establishing a logical chain of pipeline dependencies across processing stages. Then providing a set of advisory checks for area’s of conflict and/or improvement.
More details on both are included against the actual stored procedure in the database changes section below.
In addition to database hardening, I’ve added a few other bits to the solution, including a PowerShell script for ADF deployments and a Data Studio Notebook to make the developer experience of implementing this code project a little nicer.

Read on to see what’s in version 1.3. Check it out on GitHub as well.

Comments closed

Reading Data from Blob Storage into Power BI

Published 2020-04-08 by Kevin Feasel

Gilbert Quevauvilliers has built a report against Power BI audit logs which have been stored in Azure Blob Storage:

Following on from my blog post on How you can store All your Power BI Audit Logs easily and indefinitely in Azure, I wanted to share with you how I connected to the Blob Storage Files and created my Power BI report, to report on the audit logs.

Click through for the instructions.

Comments closed

Switching Data Factories in Azure Data Factory

Published 2020-04-08 by Kevin Feasel

Koen Verbeeck takes us through switching the current environment in Azure Data Factory:

When working with Azure Data Factory, it’s possible you have multiple ADF environments. For example, you can have one for dev, one for test and one for production. Unfortunately, the functionality to switch between them is a bit hidden in the user interface.

Read on to see how.

Comments closed

Hive + LLAP Now Faster with ElasticMapReduce 6

Published 2020-04-06 by Kevin Feasel

Suthan Phillips has a benchmark for ElasticMapReduce 5 versus 6:

To evaluate the performance benefits of running Hive with Amazon EMR release 6.0.0, we’re using 70 TCP-DS queries with a 3 TB Apache Parquet dataset on a six-node c4.8xlarge EMR cluster to compare the total runtime and geometric mean with results from EMR release 5.29.0.
The results show that the TPC-DS queries run twice as fast in Amazon EMR 6.0.0 (Hive 3.1.2) compared to Amazon EMR 5.29.0 (Hive 2.3.6) with the default Amazon EMR Hive configuration.
The following graph shows performance improvements measured as total runtime for 70 TPC-DS queries. Amazon EMR 6.0.0 has the better (lower) runtime.

Click through for the measures and a bit more info on LLAP.

Comments closed

Checking JSON Structure with ADF

Published 2020-04-06 by Kevin Feasel

Rayis Imayev takes us through the solution of a tricky problem in Azure Data Factory:

Within my “ForEach” container I have also placed a Stored Procedure task and set 4 data elements from my incoming data stream as values for corresponding parameters.
However this approach will not work for all my incoming JSON events, it actually failed for the last one, since it didn’t have both “stop_time” and “last_update” data elements.
An easy way to fix this problem is to add missing data elements with empty values for the last event record, however, when we don’t have control over incoming data, we need to adjust our data processing steps.

Read on to see how Rayis solves this problem.

Comments closed

M	T	W	T	F	S	S
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

Category: Cloud

Using Azure Functions Inside Azure Data Factory

Inserting Geospatial Data into Cosmos DB

CHECKDB Matters in the Cloud Too

Clarifying Nomenclature around Azure Synapse Analytics

Migrating to Azure with SQL Server Management Studio

Metadata Integrity Checks in ADF.ProcFwk

Reading Data from Blob Storage into Power BI

Switching Data Factories in Azure Data Factory

Hive + LLAP Now Faster with ElasticMapReduce 6

Checking JSON Structure with ADF