Press "Enter" to skip to content

Category: Cloud

Filtering by Distance in Cosmos DB

Hasan Savran shows how we can write queries against spatial data in Cosmos DB:

We are ready to use spatial functions to retrieve data. I have the location of Hurricane Katrina. I want to find my customers who might be affected by Katrina. I want to use the Distance function for this. I have the locations of eye of Katrina in my database, first I will pick one of those locations then look for my customers close to that location. How close? You can control the distance with spatial function named distance, In the following example, I use 100 km which is approximately 62 miles.

Hasan has a demo for us, so check it out.

Comments closed

Using Azure Functions Inside Azure Data Factory

Rayis Imayev shows how you can call an Azure Function from inside your Azure Data Factory Pipeline:

Creating a data solution with Azure Data Factory (ADF) may look like a straightforward process: you have incoming datasets, business rules of how to connect and change them and a final destination environment to save this transformed data. Very often your data transformation may require more complex business logic that can only be developed externally (scripts, functions, web-services, databricks notebooks, etc.).

In this blog post, I will try to share my experience of using Azure Functions in my Data Factory workflows: my highs and lows of using them, my victories and struggles to make them work.

This includes a description of the options, a demo function, and additional notes for each technique.

Comments closed

Inserting Geospatial Data into Cosmos DB

Hasan Savran shows us how we can use the .NET SDK to insert geospatial data into Cosmos DB:

GeoSpatial Data can help you to answer many questions in your business If you know how to use Spatial data. Searching data by radius can bring you all kind of interesting data. For example, If you know the path of hurricane, you can make searches by using the path and find all your customers under that path. Then you can be proactive and do something about this upcoming problem for your business.

     Many databases support Spatial Data Types, I will cover how to store Spatial Data in Azure Cosmos DB in this post. I have an earlier post about how to import Spatial Data into Azure Cosmos DB by using Data Migration Tool. I will focus on how to store spatial data by using .NET SDK in this post. I used .NET SDK 3.8.0, you can get the latest SDK from here..

Click through for a demonstration.

Comments closed

CHECKDB Matters in the Cloud Too

Daniel Janik takes us through an ordeal related to CHECKDB on an Azure Managed Instance:

This is crazy! What now? Open at ticket with MSFT? This seemed the only choice and what was the root cause? Apparently in Azure Managed Instances, Microsoft will check databases for corruption and will take the database offline if detected.

When in this special offline state there’s no way to access the database and Microsoft must be contacted. You can’t set the DB in recovery mode or change it to ONLINE. Microsoft does “contact” someone to notify that the database was taken offline due to corruption but if you work at a larger company this notification may never reach the right people.

Read on to see what Daniel ended up doing and some tips on making the process smoother.

Comments closed

Clarifying Nomenclature around Azure Synapse Analytics

James Serra clears a few things up:

I see a lot of confusion among many people on what features are available today in Azure Synapse Analytics (formally called Azure SQL Data Warehouse) and what features are coming in the future. Below is a picture (click to zoom) that I describe below that hopefully clears things up:

I tend to just say “Azure Synapse Analytics SQL Pools” for the product formerly known as Azure SQL Data Warehouse and save “Azure Synapse Analytics” to include Spark + hyperscale (James’s v3).

Comments closed

Migrating to Azure with SQL Server Management Studio

Magi Naumova walks us through some options for migrating on-prem instances to Azure, all of which are available in SQL Server Management Studio:

The cases of migrating our database in Azure become more and more every day. Azure SQL Database is the flagship SaaS service Microsoft Provides for hosting a relational database. But no matter it is the same engine there are still many features not supported or with limited functionalities in Azure SQL DB comparing to on premises SQL Server versions. For example, all cross-database references are possible in on premises SQL Server databases but is not supported in Azure SQL Database.

If we could check in advance and plan our migration based on those checks it would be time and effort saving. This is what Migrate to Azure new SSMS features are built for.

Click through for the options, some of which are simply informational and some of which actually do the work.

Comments closed

Metadata Integrity Checks in ADF.ProcFwk

Paul Andrew has another update to the ADF metadata-driven processing framework:

With this release of the framework I wanted to take the opportunity to harden the database and add some more integrity (intelligence) to the metadata, things that go beyond the existing database PK/FK constraints. After all, this metadata drives everything that Azure Data Factory does/is about to do – so it needs to be correct. These new integrity checks take two main forms:

1. Establishing a minimum set of criteria within the metadata before the core Data Factory processing starts and creates an execution run.
2. Establishing a logical chain of pipeline dependencies across processing stages. Then providing a set of advisory checks for area’s of conflict and/or improvement.

More details on both are included against the actual stored procedure in the database changes section below.

In addition to database hardening, I’ve added a few other bits to the solution, including a PowerShell script for ADF deployments and a Data Studio Notebook to make the developer experience of implementing this code project a little nicer.

Read on to see what’s in version 1.3. Check it out on GitHub as well.

Comments closed

Hive + LLAP Now Faster with ElasticMapReduce 6

Suthan Phillips has a benchmark for ElasticMapReduce 5 versus 6:

To evaluate the performance benefits of running Hive with Amazon EMR release 6.0.0, we’re using 70 TCP-DS queries with a 3 TB Apache Parquet dataset on a six-node c4.8xlarge EMR cluster to compare the total runtime and geometric mean with results from EMR release 5.29.0.

The results show that the TPC-DS queries run twice as fast in Amazon EMR 6.0.0 (Hive 3.1.2) compared to Amazon EMR 5.29.0 (Hive 2.3.6) with the default Amazon EMR Hive configuration.

The following graph shows performance improvements measured as total runtime for 70 TPC-DS queries. Amazon EMR 6.0.0 has the better (lower) runtime.

Click through for the measures and a bit more info on LLAP.

Comments closed