2021-09-22 – Curated SQL

Look, I’m not here to fight your religious war about how snapshots should not be called backups. I’m just gonna call them fast-as-fast restores(*) and be done with it. Because let’s be honest, with Pure Storage there’s absolutely nothing faster than a storage snapshot to recover a volume. Or volume(s). You get the idea. It’s about how fast you recover, every time.
Yes, I do understand that there are a million of considerations for something to be called a “backup”. We’ll get to those little by little – don’t expect a thorough post on that debate right now. Today I want to focus on one question: Are Pure Storage FlashArray snapshots stable, trustworthy enough that I can take them without pausing I/O against my database? Can I trust that the database will come online every time from a snapshot?

Read on for the Answer. For additional fun, read the whole article with your mental voice sounding like Argenis.

Comments closed

CI/CD with Databricks Notebooks and Azure DevOps

Published 2021-09-22 by Kevin Feasel

Michael Shtelma and Piotr Majer get us started on an MLOps journey:

This is the first part of a two-part series of blog posts that show how to configure and build end-to-end MLOps solutions on Databricks with notebooks and Repos API. This post presents a CI/CD framework on Databricks, which is based on Notebooks. The pipeline integrates with the Microsoft Azure DevOps ecosystem for the Continuous Integration (CI) part and Repos API for the Continuous Delivery (CD).In the second post, we’ll show how to leverage the Repos API functionality to implement a full CI/CD lifecycle on Databricks and extend it to the fully-blown MLOps solution.

Click through for the article and a link to code. You can also see the pipeline YAML (and Python code it calls) in the repo.

Comments closed

Ensemble Classification in Azure Machine Learning

Published 2021-09-22 by Kevin Feasel

Dinesh Asanka reminds me not to use the designer for tough Azure ML problems:

Let us see how we can extend the standard classification to Ensemble Classifiers in Azure Machine Learning. Before we discuss the details of this configuration, you can view or download the experiment from Ensemble Classification
The following figure shows the complex layout of the Ensemble Classifiers in Azure Machine Learning.

Dinesh is not kidding about that complexity. This is definitely a use case for the Azure ML SDK.

Comments closed

Spark Performance Improvements in Azure Synapse

Published 2021-09-22 by Kevin Feasel

Balaji Sankaran shows improvements Microsoft has made over open-source Apache Spark 3 in Azure Synapse Analytics:

Azure Synapse Analytics is continually focused on delivering a highly performant and scalable platform for supporting Spark Workload. We are focused on improving the query performance for the typical workload patterns that we see with our customers. By combining the latest open-source updates in Apache Spark with our team’s focus on performance updates we have made significant performance gains in standard TPC-DS benchmarking tests.

I expect it will never be as fast as what Databricks can do, but getting a 2x performance improvement over the open source version of Spark is nothing to sneeze at.

Comments closed

Showing Different Contacts per Dashboard in Power BI Apps

Published 2021-09-22 by Kevin Feasel

Gilbert Quevauvilliers wants to spread the blame around:

I recently got a request asking if it was possible to have a different contact person for each report or dashboard within a single Power BI App.
The good news is yes, this can certainly be done, and I will show you below how I got it working.

Read on to see what the normal situation looks like, and then how to set it per report.

Comments closed

Finding the Binding: I/O or CPU as the Constraint

Published 2021-09-22 by Kevin Feasel

Erik Darling lays down a lesson for us:

When you’re looking for queries to tune, it’s important to understand which part is causing the slowdown.
That’s why Actual Execution plans are so valuable in newer versions of SQL Server and SSMS. Getting to see operator timing and wait stats for a query can tell you a lot about what kind of problem you’re facing.
Let’s take a look at some examples.

Let’s, shall we?

Comments closed

Interactive .NET Notebooks in Visual Studio Code

Published 2021-09-22 by Kevin Feasel

Deborah Melkin tries out .NET Interactive Notebooks in Visual Studio Code:

These days, we tend to think Azure Data Studio when we database developers talk about notebooks, specifically SQL Notebooks. But what Rob used for his demos are a new functionality within VS Code called .NET Interactive Notebooks. It was developed in combination with the Azure Data Studio team and it has support for SQL. But the cool thing that intrigued me was that a notebook could support multiple kernels, unlike Azure Data Studio. Knowing how much we love our SQL and PowerShell and this being a feature that many of us want to see in SQL Notebooks, I decided to try and set this up and poke around.

Click through for Deb’s experiences. And I’ll also point out that .NET Interactive Notebooks supports the best .NET language (and the one which most naturally fits the ethos of notebooks), F#.

Comments closed

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

Day: September 22, 2021

Pure Storage FlashArray Snapshot Torture Test

CI/CD with Databricks Notebooks and Azure DevOps

Ensemble Classification in Azure Machine Learning

Spark Performance Improvements in Azure Synapse

Showing Different Contacts per Dashboard in Power BI Apps

Finding the Binding: I/O or CPU as the Constraint

Interactive .NET Notebooks in Visual Studio Code