Press "Enter" to skip to content

Author: Kevin Feasel

DevOps for Databricks

Anna Wykes starts off with bad news:

In this blog series I explore a variety of options available for DevOps for Databricks. This blog will focus on working with the Databricks REST API & Python. Why you ask? Well, a large percentage of Databricks/Spark users are Python coders. In fact, in 2021 it was reported that 45% of Databricks users use Python as their language of choice. This is a stark contrast to 2013, in which 92 % of users were Scala coders:

What is wrong with the world today?

Semi-seriously, though, do read Anna’s post, as it covers a variety of things you can do with the Databricks REST API, including cluster management and monitoring. I might be jumping the gun a bit, but I am a big fan of Gerhard Brueckl’s Powershell module for Databricks for this kind of work.

Comments closed

Deploying SQL Server to Azure Container Instance via ARM

Rajendra Gupta builds an ARM template:

The Azure Resource Manager (ARM) template is a JavaScript Object Notation (JSON) file for deploying Azure resources automatically. You can use a declarative syntax to specify the resources, their configurations. Usually, if you need to deploy Azure resources, it might be a tiring experience of navigating through different services, their configurations. With the ARM templates, you no longer need to click and navigate around the portal. For example, you can use configure the template for Azure VM or Azure SQL Database deployment.

Click through for a step-by-step walkthrough. I will say, though, that I tend heavily to revise ARM templates the Azure Portal creates. They tend to make everything parameters, to the point where you get inundated with context-free decisions.

Comments closed

Dedicated SQL Pool Index, Distribution, and Partition Guidance

I have a write-up on the specific value of distributions, indexes, and partitions in Azure Synapse Analytics dedicated SQL pools:

Not too long ago, I ended up taking the DP-203 certification exam for sundry reasons. On that exam, they ask a lot about Azure Synapse Analytics, including indexing, distribution, and partitioning strategies. Because these can be a bit different from on-premises SQL Server, I wanted to cover what options are available and when you might choose them. Let’s start with distributions, as that’s the biggest change in thought process.

Read on for the guidance.

Comments closed

Drawing a Christmas Tree with KQL

Guy Reginiano has a task:

KQL isn’t just super-powerful, it’s also fun!
See how you can draw a tree using KQL and learn some of the functions and operators available.
Inspired by https://lnkd.in/eCgFzBTw. Feel free to design and share your own trees!

I kind of want to make this a Hello World type of exercise, ranking languages by their Christmas Tree Generation Capability Score, or CTGC. Maybe I’ll shorten it to TGC to make it a TLA.

Comments closed

Marking Replication Transactions Complete

Andrea Allred spams the “burn it down” button:

Replication is not my favorite, it is kind of far from my favorite. No further than that. Little further.

When it breaks, it can cause havoc and it always seems to break at the worst time. Recently we noticed that our logfile was massive (like 3 times the size of the database) and that was making many of the other processes painful. We didn’t know how long the log hadn’t been clearing so we got to burn it all (kind of).

The first thing I did was tell replication that we were done with all the transactions that had been committed.

I’d say about 40-50% of the pain of replication is in how difficult it is to troubleshoot. Transactional replication is an order of magnitude easier than merge replication, too, especially on systems of non-trivial size and scale. The single most common question I get is “When will this row be replicated to the other side?” I can’t answer that with merge replication. The second-most common question is, “Why are things slower right now than before?” Can’t answer that either…

Comments closed

Use TOP instead of SET ROWCOUNT

Jared Poche explains why the TOP clause is superior to using SET ROWCOUNT:

I was presenting on how to use the TOP clause to break down large operations into short, fast, bite-sized operations. The mechanics are things I learned from writing processes that do garbage collection, backfill new columns, and anonymizing PII data on existing tables. I’ve just posted the slides and example scripts here if you are interested.

ARE THEY THE SAME?

The question was whether the SET ROWCOUNT command would work just the same, and the answer is sometimes yes but largely no.

Read on to see what Jared means.

Comments closed

Filtering with DAX for Paginated Reports

Adam Aspin takes us through an important topic for paginated report developers:

In the previous article of this short series, you learned the fundamentals of creating datasets using DAX to populate paginated reports delivered using the Power BI Premium service. The next step is to appreciate the practicalities – and subtleties – of how data can be filtered using DAX for paginated report output.

As most, if not all, report developers come from an SQL background, it may seem overkill to devote an entire article to filtering data. However, DAX is very unlike SQL as far as filtering output data is concerned. Something as simple as classic OR logic needs to be handled differently from the techniques you may be used to – either as a SQL or as a Power BI developer. To ensure that you can deliver the report data that you need to populate paginated reports, take a detailed look at how to filter data in DAX datasets using the core SUMMARIZECOLUMNS() function.

Read the whole thing.

Comments closed