Cloud – Page 43 – Curated SQL

Bring Fabric to the Data Lakehouse

Published 2023-06-29 by Kevin Feasel

Ust Oldfield ties together Databricks and Microsoft Fabric:

We’ve built countless Lakehouses for our customers and influenced the design of many more. With the advent of Fabric, many organisations with existing lakehouse implementations in Azure are wondering what changes Fabric will herald for them. Do they continue with their existing lakehouse implementation and design, or do they migrate entirely to Fabric?

For many, the answer will be to continue as-is. They’ve invested a lot of time and money in establishing a Lakehouse – to migrate now to a slightly different technology stack would be a very costly exercise! There also isn’t a need to migrate from a lakehouse implementation in Databricks to one in Fabric as there aren’t concrete benefits to be realised.

For those using Power BI as their semantic and reporting layers, as well as using Databricks SQL or Synapse Serverless as the serving layer, Fabric provides a perfect opportunity to rationalise the architecture and to bring about substantial performance gains through the Direct Lake connectivity and V-Order compression in Fabric.

Read on to see what Ust means, using a couple of architecture diagrams along the way.

Comments closed

Preventing Accidental Azure Changes with Resource Locks

Published 2023-06-28 by Kevin Feasel

Khushbu Ghandi puts a padlock on it:

Resource locks are just locks that we can associate to different scopes in Azure allowing us to override permissions at that resource scope and down. When we talk about the scope of the resource lock, we can lock subscriptions, we can lock resource groups and individual resources, and the lock restrictions that we have based off the type of lock we select will apply to all users and roles that have access to that resource. Also, it’s worth noting that locks are inherited by child resources. So, if we apply a lock on a subscription, it is inherited by all the resource groups that have been created under that subscription along with the resources that will be created under the resource groups.

Resource locks come with their own considerations, and Khushbu dives into those. This is a concept I like more in theory than in practice, save for pretty stable systems where you keep things running 24/7.

Comments closed

Cache Recommendations for Azure Data Explorer

Published 2023-06-27 by Kevin Feasel

Guy Reginiano notes an update:

A new generation of cache recommendations for Azure Data Explorer is now available in the Azure portal!
This update introduces significant improvements, including enhanced logic, additional statistics for end users, an improved user interface, and a streamlined process for reviewing and applying recommendations. In this blog post, we will explore the new features and benefits offered by this latest update.

Read on to see where you can find these cache recommendations, as well as the types of recommendations you’re liable to receive.

Comments closed

A Complex Example of ADF Pipeline Return Value

Published 2023-06-27 by Kevin Feasel

Andy Leonard goes beyond the simple example:

In this post, I demonstrate one way to create a child pipeline that returns the SubscriptionId for a data factory. I then call the child pipeline from a parent package.

To build this demonstration, please follow the instructions that follow.

This is definitely more complicated than Andy’s simple example, but there are plenty of screenshots to take you through the process.

Comments closed

Recursive File Deletion in S3

Published 2023-06-26 by Kevin Feasel

The Big Data in Real World team runs rm -rf:

In this post we will see how to recursively delete files/objects, folders and bucket from S3.

I made the joke before reading the article, and it turns out that I was pretty close to spot-on. Read on to see how you can do this via the AWS CLI.

Comments closed

Trying the Azure OpenAI Playground

Published 2023-06-26 by Kevin Feasel

Obaro Alordiah gives us a primer:

The Azure OpenAI Service has been a trending topic in the tech world this year as it combines the power of OpenAI’s advanced generative AI models with the comprehensive suite of services available on the Azure cloud. It has given developers the opportunity to create and embed high performing AI models into the Azure environment to deliver more efficient, insightful & innovative solutions. In this blog, we will take a high level look at some of the key features within the Azure OpenAI playground and how we can get the best out of it.

Generative AI via OpenAI is an area in which Microsoft is putting an inordinate amount of focus.

Comments closed

A Simple Example of ADF Pipeline Return Value

Published 2023-06-26 by Kevin Feasel

Andy Leonard starts easy:

I want to develop an Azure Data Factory (ADF) design pattern for calling focused, unit-of-work, function-y ADF pipelines that perform focused tasks. Some of these “worker pipelines” will need to return values to the calling pipeline.

In this example, I started by reading Mark Kromer‘s (excellent) article titled You can now customize the return value from your pipeline! I then crafted the simple example shown in this post to make sure I understood the principles involved before using pipeline return value (preview) functionality in more robust ADF patterns.

Follow the steps I outline below to build a simple example for an ADF pipeline that returns a value!

Click through to follow those steps.

Comments closed

Choosing a Load Balancing Option in Azure

Published 2023-06-23 by Kevin Feasel

Santosh Hari looks at the options:

Azure docs have a great page on the various load balancing options in Azure that even has an awesome flowchart summing up the choices. However, not being from a networking background, combined with Microsoft’s “special” naming, combined with some sort of memory issue recalling these names from memory meant that even if I had to rely on rote memory when in conversations with customers, I would often mix up the names. For instance, confuse traffic manager and load balancer. So, I decided to understand some of the basics behind cloud load balancers to help become a more interesting conversationalist in this topic: “well actually, you should be using an app gateway there, John”.

This often isn’t in the database administrator’s purview, but Santosh does a good job of explaining the concepts and, if you’re hosted in Azure, it is good to know what’s sitting in front of your database.

Comments closed

Read and Write Data with PySpark

Published 2023-06-22 by Kevin Feasel

Dustin Vannnoy has two of the three R’s down:

Every Spark pipeline involves reading data from a data source or table. For data engineers we usually end the pipelines by writing the transformed data. In this tutorial we walk through some of the most common format and cloud storage locations for reading and writing with Spark. We’ll save some of the advanced Delta Lake capabilities for another tutorial.

Click through to see how to read from and write to CSV, JSON, and Parquet formats. Dustin has examples of working with Azure Blob Storage, S3, and Google Cloud Storage, and even some database examples with JDBC.

Comments closed

Running SqlBulkCopy in Parallel from Powershell

Published 2023-06-22 by Kevin Feasel

Jose Manuel Jurado Diaz has a script for us:

Today, we encountered an interesting service request of attempting to reduce the load times for 100,000 records from a table with 97 varchar(320) fields in an Azure SQL HyperScale database. Following, I would like to share my lessons learned here.

The idea is to split in different concurrent process the execution of multiples SqlBulkCopy. In this case, we are going to split this process in 5 processes running in parallel inserting 20,000 rows, let’s try to know the total size.

Read on for the script, as well as a rough idea of how long it’ll take inserting into an Azure SQL DB Hyperscale instance.

Comments closed

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31

Category: Cloud