2019-12-17 – Curated SQL

To give a bit more insight into why we built Schiphol Takeoff, it’s good to take a look at an example use case. This use case ties a number of components together:
– Data arrives in a (near) real-time stream on an Azure Eventhub.
– A Spark job running on Databricks consumes this data from Eventhub, processes the data, and outputs predictions.
– A REST API is running on Azure Kubernetes Service, which exposes the predictions made by the Spark job.
Conceptually, this is not a very complex setup. However, there are quite a few components involved:
– Azure Eventhub
– Azure Databricks
– Azure Kubernetes Service
Each of these individually has some form of automation, but there is no unified way of coordinating and orchestrating deployment of the code to all at the same time. If, for example, you were to change the name of the consumer group for Azure Eventhub, you could script that. However, you’d also need to manually update your Spark job running on Databricks to ensure it could still consume the data.

This looks pretty nice. I’ll need to dive into it some more.

Comments closed

New Features in Kafka 2.4

Published 2019-12-17 by Kevin Feasel

Manikumar Reddy announces new features in Apache Kafka 2.4:

KIP-392: Allow consumers to fetch from closest replica
Historically, consumers were only allowed to fetch from leaders. In multi-datacenter deployments, this often means that consumers are forced to incur expensive cross-datacenter network costs in order to fetch from the leader. With KIP-392, Kafka now supports reading from follower replicas. This gives the broker the ability to redirect consumers to nearby replicas in order to save costs.

It’s not the biggest release of Kafka ever, but there are some really nice updates here.

Comments closed

Testing Power BI Report Performance in the Browser

Published 2019-12-17 by Kevin Feasel

Chris Webb continues a series on testing Power BI report performance in a browser. Part 2 walks us through some of the mechanics of the process:

Before you publish your report, in Power BI Desktop add a blank page with no visuals on to it. It doesn’t need to be the page that is opened when the report opens and you will be able to delete it later. Why do this? When you’re testing how long it takes for your report page to render, you’re probably doing so because you want to improve performance. Some things in the report page that influence performance you have the power to change, such as the design of the dataset, the DAX in the measures, the number and type of visuals on a page; some things will always happen when a report runs and you have to accept that overhead. Testing how long a blank page takes to render will give you an idea of how long this latter category of “things that always happen” takes, and you can subtract this time from the time your chosen report page takes to run.

Part 3 is a demonstration of the process:

…so you go ahead and publish. You view the report after publishing and it still seems fast. Then the complaints start coming in: the report is slow!?! It seems to be users who are viewing the report on their phone who are having the most problems. So, following the instructions in my last post, you open up Chrome DevTools and run an audit using a simulated slow 4G connection:

That’s an important part of testing. We normally develop inside a fast network, but our users may be on rather slow networks.

Comments closed

Copy Reports with Shared Data Sets Between Workspaces

Published 2019-12-17 by Kevin Feasel

Gilbert Quevauvilliers ran into a cross-environment issue:

I was working on some documentation for a customer and I came across a very quick and easy way to create a copy of a report which also creates a connection to the shared dataset that I could then copy to another “New Workspace”
Before I found out this gem, I had to manually do this via PowerShell which worked really well, but I had to do a whole lot of extra work to find the GUID’s then test it and make sure it works. With this new method it makes it simple and quick. It is a WIN-WIN
You can follow along as I show you how to do it below.

Click through for the demonstration.

Comments closed

When FOR JSON PATH Isn’t Enough

Published 2019-12-17 by Kevin Feasel

Dave Mason walks us through some options when working with JSON data in SQL Server:

In both situations, we need to know something about the JSON schema to query it in a meaningful way: in the first example, column names and types are hard-coded; in the second example, column names are hard-coded as path parameter values for the JSON_VALUE function. Even though JSON data is self-describing, SQL Server doesn’t have a way to infer schema. (I would be quite happy to be wrong about this–please add a comment if you know something I don’t!) About the time I came to this realization, I commented on Twitter that JSON might be fool’s gold. You don’t need to know schema to store JSON data in SQL Server. But you do if you want to query it. “It’s pay me now or pay me later.”

It’s schema on read or schema on write. I’m not sure there is ever a truly schema-free scenario in a business application.

Comments closed

The Benefits of DAX Variables

Published 2019-12-17 by Kevin Feasel

Reza Rad explains why you should use DAX variables if you’re repeating calculations:

We have to main parts in the expression above: A and B. Each of those is doing a calculation. Now, with the markings above, reading the expression is much simpler. The whole expression means this:
=IF(A>B, A, B)
All the above expression is saying is that if A is bigger than B, then return A, otherwise B. Now it is much simpler to read it because we split the repetitive parts into sections. That is what exactly the DAX variable is for.

Readability is not the only benefit, however. Reza has more.

Comments closed

Why Disabling the Clustered Index is a Bad Idea

Published 2019-12-17 by Kevin Feasel

Kenneth Fisher has an experiment in mind:

You are probably already aware that you can disable an index. This can be handy when you have a large load and the load + re-enabling the indexes (you have to completely rebuild them) is faster than leaving the indexes in place. I’ve had pretty limited occasions where this has helped but it can be a handy trick at times. That said, this is only true for non-clustered indexes. What happens when you disable the clustered index?

Nothing good, that’s what.

Comments closed

Choosing the Right Azure VM Type for SQL Server

Published 2019-12-17 by Kevin Feasel

Glenn Berry walks us through Azure virtual machine classes and picks out good ones for running SQL Server:

For high performance OLTP SQL Server workloads, the memory optimized type of Azure VMs is usually the best choice. According to Microsoft, “Memory optimized VM sizes offer a high memory-to-CPU ratio that are great for relational database servers.” This gives you lower core counts, with more memory, which is usually what you want for SQL Server, to minimize your license costs and still have good performance.
You can go even further down this path with Constrained vCPU capable VM sizes, where you can constrain the VM vCPU count (to one half or one quarter of the original VM size) to reduce the cost of SQL Server licensing, while maintaining the same memory, storage, and I/O bandwidth as a non-constrained VM. These constrained Azure VMs have a suffix in the name that indicates the number of active vCPUs in the VM.

Click through for the comparison.

Comments closed

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31

Day: December 17, 2019

Schiphol Takeoff: Low-Code Automated Deployment

New Features in Kafka 2.4

Testing Power BI Report Performance in the Browser

Copy Reports with Shared Data Sets Between Workspaces

When FOR JSON PATH Isn’t Enough

The Benefits of DAX Variables

Why Disabling the Clustered Index is a Bad Idea

Choosing the Right Azure VM Type for SQL Server