Press "Enter" to skip to content

Month: May 2021

Querying Delta Lake Files with T-SQL in Azure Synapse Analytics

Jovan Popovic shows off a new feature in public preview for Azure Synapse Analytics serverless SQL pools:

You can use Azure Synapse and Azure Databricks to prepare and modify your Delta Lake data sets placed in the Azure Data Lake storage. Once your data engineers have prepared the data, your data analysts can create reports using the tools such as Power BI.

Using the serverless query endpoint in Azure Synapse, you can create a relational layer on top of your Delta Lake files that directly references the location where Azure Synapse and Azure Databricks are used to modify data. This way, you can get the real-time analytics on top of the Delta Lake data set without any need to wait for a pipeline to copy and prepare data.

Read on to see how this works.

Leave a Comment

Understanding the STUFF() Function

Chad Callihan explains an important function:

I used to always see the STUFF function in passing when reading blogs and kind of move past it without understanding what it was doing. I would see it used and think it’s doing something with a bunch of stuff and kind of skip over it since it wasn’t entirely relevant to what else I was reading. When I read about what the STUFF function actually does, it made a lot more sense as a name. More than dealing with “a bunch of stuff” the STUFF function is used for stuffing a string into another string. Let’s check out a few examples of stuffing data.

Granted, 99% of its importance is in combination with FOR XML PATH() but that’s still important. And we get to see a few other use cases for it as well.

Leave a Comment

Restoring Databases from Blob Storage Files

Stuart Moore talks us through a (rare) gap in dbatools:

In the comments here I was asked about using Restore-DbaDatabase when all you have is blobs in an Azure Storage account. This can be done, involves a couple of non dbatools steps.

Restore-DbaDatabase doesn’t natively ‘talk’ Azure, nor do any of the other dbatools commands. The reason for this is that we didn’t want to force dbatools users to have to install the AzureRM/Az powershell modules just to use our module. We’ve gone to a lot of effort to make sure that dbatools is acceptable to Security Admins and that it has a small(ish) footprint, and adding those large modules as prerequisites would have broken that.

Read on for how you can get around that.

Leave a Comment

Securing Databricks on AWS

Andrew Weaver, et al, take us through security practices for running Databricks on AWS:

In this article, we will share a list of cloud security features and capabilities that an enterprise data team can use to harden their Databricks environment on AWS as per their risk profile and governance policy. For more information about how Databricks runs on Amazon Web Services (AWS), view the AWS web page and Databricks security on AWS page for more specific details on security and compliance.

Click through for that list.

Leave a Comment

Error Handling Patterns in Kafka

Gerardo Villeda gives a few options for handling errors in an Apache Kafka topic:

Apache Kafka® applications run in a distributed manner across multiple containers or machines. And in the world of distributed systems, what can go wrong often goes wrong. This blog post covers different ways to handle errors and retries in your event streaming applications. The nature of your process determines the patterns, and more importantly, your business requirements.

This blog provides a quick guide on some of those patterns and expands on a common and specific use case where events need to be retried following their original order. This blog post illustrates a scenario of an application that consumes events from one topic, transforms those events, and produces an output to a target topic, covering different approaches as they gradually increase in complexity.

Click through for the list. Each explanation is pretty short, but opens the door for further analysis.

Leave a Comment

Comparing Datasets in R

The folks at finnstats take us through a package to compare datasets in R:

How to find dataset differences in R, when the pieces of information are changing between datasets it’s a difficult task to identify the same.

Here we are going to discuss the daff package in R, daff package helps us to identify the differences and visualize them in a beautiful way.

Click through for the demonstration, including a video. H/T R-Bloggers

Leave a Comment