Press "Enter" to skip to content

Category: Cloud

Comparing Integration Services and Azure Data Factory

Tim Mitchell compares SQL Server Integration Services to Azure Data Factory:

For the better part of 15 years, SQL Server Integration Services (SSIS) has been the go-to enterprise extract-transform-load (ETL) tool for shops running on Microsoft SQL Server. More recently, Microsoft added Azure Data Factory (ADF) to its stable of enterprise ETL tools. In this post, I’ll be comparing SSIS and Azure Data Factory to share how they are alike and how they differ. I’ll also review the strengths and shortcomings of each, including the architectures in which each of these is likely to do well.

Read on for Tim’s thoughts on the subject. Tim lays out his biases up-front but also gives you a good feel for where both products are in their lifecycles.

Comments closed

Case-Insensitive Searches in Cosmos DB

Hasan Savran gives us a few techniques for case-insensitive searching in Azure Cosmos DB:

Data gets saved as JSON in Azure Cosmos DB. JSON documents supports string, numeric and Boolean data types. Because of limited data types, you might end up picking string data type for most of your data. You may need to use Cosmos DB’s string functions to help you in many situations. For example; you might need to find names start with bunch of characters or you might need to find data ends like a word or bunch of characters. You may need to find all data that has address contains a street name. What do you do in these cases? Azure Cosmos DB has couple of system functions to help you, I will focus on the following functions in this post. These are the functions just got updated in June 2020.

Click through to learn a bit about those money-saving updates.

Comments closed

Getting Started with Redshift

Rahul Mehta has a primer on AWS Redshift:

AWS Redshift is a columnar data warehouse service on AWS cloud that can scale to petabytes of storage, and the infrastructure for hosting this warehouse is fully managed by AWS cloud. Redshift operates in a clustered model with a leader node, and multiple worked nodes, like any other clustered or distributed database models in general. It is based on Postgres, so it shares a lot of similarities with Postgres, including the query language, which is near identical to Structured Query Language (SQL). This Redshift supports creating almost all the major database objects like Databases, Tables, Views, and even Stored Procedures. In this article, we will explore how to create your first Redshift cluster on AWS and start operating it.

I’m not really the biggest fan of Redshift around, but Rahul does a good job walking us through the basics of the product.

Comments closed

Incremental Pipline Development with Azure Data Factory

Andy Leonard shows how you can incrementally develop Azure Data Factory pipelines:

A friend pinged me recently to ask about rolling back Azure Data Factory (ADF) pipeline versions. My response was a question: Are you using source control with ADF? That did not help the current situation.

I thought of the way I often build ADF pipelines and shared my methodology, which is relatively simple (it has to be simple for me to understand it!):

Click through for Andy’s approach.

Comments closed

Methods to Run Scheduled Tasks in Azure

Joey D’Antoni has a roundup of several techniques you can use to run scheduled tasks against an Azure SQL Database:

If you’ve worked with Microsoft SQL Server for any period of time, you are familiar with the SQL Server Agent. The Agent, which remains mostly unchanged since I started working with in 1999, is a fairly robust job scheduler that can also alert you in the event of job failures or system errors. I feel as though it’s a testament to the quality of the original architecture that the code hasn’t changed very much–it still meets the needs of about 90-95% of SQL Server workloads, based on an informal twitter discussion I had a few months ago. There are some cases where an enterprise scheduling tool is needed, but for maintaining most SQL Servers and executing basic ETL, the agent works fine. There’s one problem–the agent is only available in SQL Server and Azure SQL Managed Instance.

Read on to learn about those options.

Comments closed

Azure Elastic Jobs Now GA

Niko Neugebauer gives into Azure Elastic Jobs, now generally available:

They key feature that needs to be well understood and which points to the potential of the Elastic Job Agent is that you are in no way limited by your own Azure SQL Database, nor by the logical Azure SQL Server where this database is located (contrary to the MSDB Database on the SQL Server), nor will you be limted by the Azure Region, Azure Resource Group or even Azure Subscription – you can configure the Elastic Job that will be reaching out to potentially any Azure SQL Database (given the necessary settings & permissions are correctly configured).

Read the whole thing. Niko shares some interesting thoughts on how it works, how you can tie your one server to a whole host of SQL Databases, and a wish list on what should come next.

Comments closed

Refreshing Power BI Datasets in Azure Data Factory

Meagan Longoria shows us how to refresh a Power BI dataset using Azure Data Factory:

I recently needed to ensure that a Power BI imported dataset would be refreshed after populating data in my data mart. I was already using Azure Data Factory to populate the data mart, so the most efficient thing to do was to call a pipeline at the end of my data load process to refresh the Power BI dataset.

Power BI offers REST APIs to programmatically refresh your data. For Data Factory to use them, you need to register an app (service principal) in AAD and give it the appropriate permissions in Power BI and to an Azure key vault.

Click through for the solution.

Comments closed

A Brief Overview of Azure Synapse Analytics

Ginger Grant gives us the nickel tour of why Azure Synapse Analytics is interesting:

In the past few months, I have been examining Azure Synapse and what it can do.  When it was first released in November of 2019, the first functionality that was rolled out was an update of Azure SQL DW.  For this reason, many people think that Synapse is just an improved version of a cloud data warehouse.  Microsoft did improve SQL DW when it moved it to Synapse.  The biggest architectural design change is the separation of the code from the compute, a theme with many web projects, which allows the compute power to be increased when need dictates and scaled down when computing needs change.  Within Synapse, resources are allocated as Pools and you can define a sql pools to run data warehouse and later change the compute to a different resource.  You will still need to partition your DW as large datasets require partitioning to perform well.  Subsequently Microsoft Released the Azure Synapse Studio to be a container for a larger environment of tools and notebooks to interact with them.

But it’s more than that. Read on to see what else is available.

Comments closed

Migrating From Cosmos DB to SQL Server

Eitan Blumin builds an app:

The general idea is this:

The app executes a Cosmos DB query and collects a number of records into its “buffer”.

Once that “buffer” reaches a certain number of records (configurable), it’s time to “flush” it into the SQL Server. That could be either a database table receiving a Bulk Copy stream, or a stored procedure receiving a table valued parameter (again, configurable).

After the buffer is flushed, we have the option to execute a “merge” procedure. This is a stored procedure that would implement an “upsert” logic from the “staging” table and into the actual destination table.

Read on for more explanation and check out Eitan’s GitHub repo.

Comments closed