Press "Enter" to skip to content

Month: July 2024

Performance Testing Microsoft Fabric Dataflow Gen2

Reitse Eskens hammers away:

In my previous blogs, I’ve been hammering Fabric with data from some different angles. Either with the Copy dataflows, notebooks, Pipelines, Data Warehouse SQL scripts or in PowerBI.
This time, I’m going to make the dataflow Gen2 work for it’s money.

Reitse tries the normal mechanism for Dataflows Gen2, but then also tries out a preview feature for fast copy and sees a marked difference.

Comments closed

RIP Stretch DB

Debbi Lyons calls it:

Ever since Microsoft introduced SQL Server Stretch Database in 2016, our guiding principles for such hybrid data storage solutions have always been affordability, security, and native Azure integration. Customers have indicated that they want to reduce maintenance and storage costs for on-premises data, with options to scale up or down as needed, greater peace of mind from advanced security features such as Always Encrypted and row-level security, and they seek to unlock value from warm and cold data stretched to the cloud using Microsoft Azure analytics services.     

During recent years, Azure has undergone significant evolution, marked by groundbreaking innovations like Microsoft Fabric and Azure Data Lake Storage. As we continue this journey, it remains imperative to keep evolving our approach on hybrid data storage, ensuring optimal empowerment for our SQL Server customers in leveraging the best from Azure.

This is not surprising at all, considering that the premise of Stretch DB was that you could off-load old and less-important data from your local SQL Server instances and expensive local disk into Azure, querying it when you need that data. The problem was, you couldn’t use cheap storage and pay a few cents per gigabyte of data per month. Instead, you were effectively spinning up Azure Synapse Analytics and paying a marked premium for your least important data. The price alone made this an untenable idea, but there were other holes in the plan as well that doomed it as a product.

Comments closed

DAX Variables and Iterators

Nikola Ilic takes us through two major concepts in DAX:

“DAX is simple, but not easy!” – famously said Alberto Ferrari, when asked which best describes Data Analysis Expression language. And, that’s probably the most precise definition of the DAX. It may look very easy at first glance, but understanding nuances and how DAX really works, requires a lot of time and “try and fail” cases.

Obviously, this article is not a deep-dive into DAX internals and will not go into these nuances, but it will (hopefully) help you to get a better understanding of the few very important concepts that will make your DAX journey more pleasant and assist you in preparing the DP-600 exam.

Click through for Nikola’s explanation of each.

Comments closed

Tips for Choosing a Classifier

I’ve wrapped up yet another series:

In this video, I wrap up the series on classification and provide some quick-and-dirty tips on when to use each of the classification algorithms we have discussed.

This was a series I really enjoyed. I’ve had a talk on the topic for a few years, but getting the opportunity to dig in deeper and spend a few hours on the topic was nice. It also helped me fill in some gaps in my understanding and fix a few long-standing bugs in my demo code, so it’s got that going for it as well.

Comments closed

Parallel Download in Oracle Object Storage

Brendan Tierney continues a series on Oracle Object Storage:

In previous posts, I’ve given example Python code (and functions) for processing files into and out of OCI Object and Bucket Storage. One of these previous posts includes code and a demonstration of uploading files to an OCI Bucket using the multiprocessing package in Python.

Building upon these previous examples, the code below will download a Bucket using parallel processing. Like my last example, this code is based on the example code I gave in an earlier post on functions within a Jupyter Notebook.

Click through for the code.

Comments closed

Export Azure SQL DB to Blob Storage

Josephine Bush runs an import-export business and wants a database to “fall off a truck”:

After a data migration, we needed to decommission the old Azure SQL DBs, but we wanted to keep a copy in case we needed anything later. Enter exporting an Azure SQL DB to storage!

Click through for an example of how it works. Given that we’re getting bacpac files out, I wonder what it would look like with a really large database.

Comments closed

Lakehouse Table Partitioning in Microsoft Fabric

Gilbert Quevauvilliers performs a split:

When loading data, it is always important to load the data with performance and scalability in mind.

For lakehouse tables to return queries quickly and to scale it is essential to load your lakehouse tables with partitions.

What I am going to show you in my blog post today is how to load data into a Lakehouse table where the table will be automatically partitioned by Year/Month/Day.

Click through for the example.

Comments closed

Suspend and Resume Microsoft Fabric Capacity

Olivier Van Steenlandt saves some cash:

With only a limited budget for exploring and testing new tools, I had to figure out how to use my budget efficiently. Therefore, before making any decisions, I looked at the Microsoft Fabric pricing and possibilities.

If you want to take a look at the Microsoft Fabric pricing models, you can find an overview via the following link: Microsoft Fabric – Pricing | Microsoft Azure

To avoid any surprises and to be as cost-effective as possible, I created an easy Python script that I can use to pause and start my Microsoft Fabric capacity, or better said resume and suspend.

I highly recommend this for any organization that does not need 24/7 uptime for Fabric capacity. If you run your system 12 hours a day instead of 24, it takes your F64 capacity from $8k a month to $4k.

Comments closed

Bit Column Order and Data Length

Brent Ozar performs an experiment:

At the PGConf.dev, where Postgres developers get together and strategize the work they wanna do for the next version, I attended a session where Matthias van de Meent talked about changing the way Postgres stores columns. As of right now (Postgres 17), columns are aligned in 8-bit intervals, so if you create a table with alternating columns:

Read on to see an example, and then Brent performs a test to see how SQL Server handles this scenario. The comments also mention that at least older versions of Oracle behaved like Postgres.

Comments closed