Press "Enter" to skip to content

Month: June 2023

A Review of Fabric Lakehouse

Teo Lachev talks lakehouses:

The Microsoft’s Lakehouse definition is less ambitious and exclusive. “Microsoft Fabric Lakehouse is a data architecture platform for storing, managing, and analyzing structured and unstructured data in a single location. It is a flexible and scalable solution that allows organizations to handle large volumes of data using a variety of tools and frameworks to process and analyze that data. It integrates with other data management and analytics tools to provide a comprehensive solution for data engineering and analytics”. In other words, a lakehouse is whatever you want it to be if you want something better than a data lake.

Read on for Teo’s classic The Good, The Bad, and The Ugly format.

Comments closed

Loading Multiple Audit Log Files in Azure SQL DB

Jose Manuel Jurado Diaz can’t stop at one:

In Azure SQL Database, the auditing feature enables you to track and monitor database activities, providing valuable insights into the actions performed on your database. One of the key components of auditing is the audit log files, which store the recorded data.

However, when dealing with a large number of audit log files stored in a blob storage container, loading them into Azure SQL Database can be a challenging task.

This article explores a workaround using the sys.fn_get_audit_file function to load multiple audit log files without being able to define a pattern such as *.xel.

Note that, even though the example is for Azure SQL Database, the function is built into SQL Server, SQL Managed Instance, and Synapse dedicated SQL pools as well and works the same way.

Comments closed

A Simple Example of ADF Pipeline Return Value

Andy Leonard starts easy:

I want to develop an Azure Data Factory (ADF) design pattern for calling focused, unit-of-work, function-y ADF pipelines that perform focused tasks. Some of these “worker pipelines” will need to return values to the calling pipeline.

In this example, I started by reading Mark Kromer‘s (excellent) article titled You can now customize the return value from your pipeline! I then crafted the simple example shown in this post to make sure I understood the principles involved before using pipeline return value (preview) functionality in more robust ADF patterns.

Follow the steps I outline below to build a simple example for an ADF pipeline that returns a value!

Click through to follow those steps.

Comments closed

Trying the sample() Function in R

Steven Sanderson gathers a sample:

Sampling is a fundamental technique in data analysis and statistical modeling. It allows us to draw meaningful insights and make inferences about a larger population based on a representative subset. In the world of R programming, the sample() function stands as a versatile tool that enables us to create random samples efficiently. In this post, we will explore the sample() function and its various applications through a series of plain English examples.

Click through for those examples.

Comments closed

Data Visualization Technology Landscape

Andy Kirk has a catalog:

My long-running catalogue of Data Visualisation Resources has for many years been the most-popular, most-visited, and most-referenced content on my website. For the last couple of years, though, it has been a little stagnant with my limited time preventing the frequent updates it needed.

Having recently completed the migration of my website to a new host and undertaken a wide-spread redesign and restructure, it felt an opportune moment to roll up my sleeves and belatedly spend some time pruning the catalogue of out-dated references and introduce all the new ones that I’d encountered, and bookmarked, but not yet added.

Click through for that, as well as the Chartmaker Directory, which gives you an idea of which visuals are available in which products, as well as examples to see them in action.

Comments closed

Choosing a Load Balancing Option in Azure

Santosh Hari looks at the options:

Azure docs have a great page on the various load balancing options in Azure that even has an awesome flowchart summing up the choices. However, not being from a networking background, combined with Microsoft’s “special” naming, combined with some sort of memory issue recalling these names from memory meant that even if I had to rely on rote memory when in conversations with customers, I would often mix up the names. For instance, confuse traffic manager and load balancer. So, I decided to understand some of the basics behind cloud load balancers to help become a more interesting conversationalist in this topic: “well actually, you should be using an app gateway there, John”.

This often isn’t in the database administrator’s purview, but Santosh does a good job of explaining the concepts and, if you’re hosted in Azure, it is good to know what’s sitting in front of your database.

Comments closed

Execution Plans of Graph Tables in SQL Server

Hugo Kornelis looks at the execution plan:

Welcome to part twenty-one of the plansplaining series, where we will continue our look at execution plans for graph queries. In the previous post, we looked at the internal structure of node and edge tables, and discovered that they have a few hidden columns. Now let’s look how those columns are used in graph queries.

Read on for the example and a deeper dive into how graph tables actually work.

Comments closed