Press "Enter" to skip to content

Category: Cloud

Creating Triggers with Cosmos DB

Hasan Savran shows how you can create a trigger in Cosmos DB:

You have options if you need to use any type of triggers in Cosmos DB. There are two types of triggers in Cosmos DB. First one which I will cover here is the regular triggers which can be executed before (Pre-Triggers) or after (Post-Triggers) an operation. This type of triggers is written in JavaScript and you need to register them to a collection just like stored procedures. Second type of triggers can be created by Azure Functions and you can find more information about them in my older posts.

     Pre-Triggers and Post-Triggers do not take any input parameters. Since Cosmos DB needs to work more work to execute triggers, you will end up with higher Request Units for your queries. They might name triggers, but both do not get executed automatically with every operation. You need to call them programmatically if you want to run them.  If trigger throws any error for any reason, transaction will roll back and data will not be saved to the database.

Naturally, triggers are going to have a performance impact on your system regardless of the choice of data platform.

Comments closed

Running and Scheduling Azure SQL DB Elastic Jobs

Kate Smith continues a series on Azure SQL Database Elastic Jobs:

In previous posts, I have demonstrated how to create an Elastic Jobs Agent, setup credentials for Elastic Jobs, create a target group of servers/databases for the agent, and how to create and define an elastic job using both PowerShell and T-SQL.

In this post, I drill down into how to run an Elastic Job both in an ad-hoc fashion and how to schedule a job to run regularly. I do this both for PowerShell and for T-SQL.

The Powershell version is a one-liner and the T-SQL version looks a good bit like it does with SQL Agent jobs.

Comments closed

Building Metadata for an ADF Pipeline

Paul Andrew continues a series on Azure Data Factory and metadata-driven pipelines:

Welcome back friends to part 2 of this 4 part blog series. In this post we are going to deliver on some of the design points we covered in part 1 by building the database to house our processing framework metadata.

Let’s start with a nice new shiny Azure SQLDB database and schema. This can easily be scaled up as our calls from Data Factory increase and ultimately the solution we are using the framework for grows.

Soon we will get to see the Azure Data Factory power in action.

Comments closed

Secure Azure Data Source Access from Databricks

Bhavin Kukadia, Abhinav Garg, and Michal Marusan show us the right way to access Azure data sources from Azure Databricks:

Enterprise Security is a core tenet of building software at both Databricks and Microsoft, and thus it’s considered as a first-class citizen in Azure Databricks. In the context of this blog, secure connectivity refers to ensuring that traffic from Azure Databricks to Azure data services remains on the Azure network backbone, with the inherent ability to whitelist Azure Databricks as an allowed source. As a security best practice, we recommend a couple of options which customers could use to establish such a data access mechanism to Azure Data services like Azure Blob StorageAzure Data Lake Store Gen2Azure Synapse Data WarehouseAzure CosmosDB etc. Please read further for a discussion on Azure Private Link and Service Endpoints.

This is more about network configuration rather than things like “store your credentials and other secrets in Azure Key Vault,” which is also a good idea.

Comments closed

Creating the Elastic Job Agent for Azure SQL Database

Kate Smith continues a series on Elastic Jobs in Azure SQL Database:

There is no way to create the Elastic Job Agent in T-SQL. I have already shown how to do this in PowerShell. To do this in the Azure Portal, go to Home, click the box that says “+ Create a Resource”, then search in the box for Elastic Job Agent. Select that, and then follow the steps in the portal to create the agent.

After creating the agent, Kate then shows how to set up credentials, target groups, and jobs.

Comments closed

Sort Keys and Join Types in Amazon Redshift

Derik Hammer takes us through query tuning a nasty job on Amazon Redshift:

My team built a process to load from a couple of base tables, in our Amazon Redshift enterprise data warehouse, into an other table which would act as a data mart entity. The data was rolled up and it included some derived fields. The SQL query had some complicity [complexity?, ed.] to it.

This process ran daily and was being killed by our operations team after running for 22 hours.

I stepped in to assist with performance tuning and discovered that join choices, such as INNER vs. OUTER joins have a big impact on whether Redshift can use its sort keys or not.

Click through for more details and what Derik ended up doing.

Comments closed

Loading Data into Delta Lake

Prakash Chockalingam takes us through auto-loading Delta Lake from various sources:

Auto Loader is an optimized file source that overcomes all the above limitations and provides a seamless way for data teams to load the raw data at low cost and latency with minimal DevOps effort. You just need to provide a source directory path and start a streaming job. The new structured streaming source, called “cloudFiles”, will automatically set up file notification services that subscribe file events from the input directory and process new files as they arrive, with the option of also processing existing files in that directory.

This does look interesting.

Comments closed

A Metadata-Driven Framework for ADF Pipelines

Paul Andrew has started a series on metadata-driven Azure Data Factory pipelines:

The concept of having a processing framework to manage our Data Platform solutions isn’t a new one. However, overtime changes in the technology we use means the way we now deliver this orchestration has to change as well, especially in Azure. On that basis and using my favourite Azure orchestration service; Azure Data Factory (ADF) I’ve created an alpha metadata driven framework that could be used to execute all our platform processes. Furthermore, at various community events I’ve talked about bootstrapping solutions with Azure Data Factory so now as a technical exercise I’ve rolled my own simple processing framework. Mainly, to understand how easily we can make it with the latest cloud tools and fully exploiting just how dynamic you can get a set of generational pipelines.

This first post lays out some of the architectural decisions and prep work needed for the series.

Comments closed

Creating an Elastic Jobs Agent

Kate Smith continues a series on elastic jobs in Azure SQL Database:

Having laid the conceptual groundwork for Elastic Jobs in two previous postings (12), I am now going to create an elastic job and associated credentials using PowerShell.  For this scenario, I have one or more databases with a table ‘T’ and statistics ‘tStats’. I want to enforce an update for these statistics every day. To do this, I need to check that my stats have been updated in the past day, and if not, update them. The T-SQL to update statistics on a table “T” with stats named “tStats” is simple:

Click through for the Powershell script.

Comments closed

Good Ideas for Designing Data Lakes

Prateek Shrivastava and Rangasayee Chandrasekaran share some advice on designing data lakes in the cloud:

Data generation and data collection across semi-structured and unstructured formats is both bursty and continuous. Inspecting, exploring and analyzing these datasets in their raw form is tedious, because the analytical engines scan the entire data set across multiple files. We recommend five ways to reduce data scanned and reduce query overheads –

Click through for the details.

Comments closed