2020-01-21 – Curated SQL

Streams and Tables in Apache Kafka

Published 2020-01-21 by Kevin Feasel

Michael Noll wraps up a series on Apache Kafka. First up is the fundamentals of Kafka Streams:

A table is a, well, table in the ordinary technical sense of the word, and we have already talked a bit about tables before (a table in Kafka is today more like an RDBMS materialized view than an RDBMS table, because it relies on a change being made elsewhere rather than being directly updatable itself). Seen through the lens of event streaming however, a table is also an aggregated stream. This is a reference to the stream-table duality we discussed in part 1.

In the conclusion, Michael covers a few advanced topics:

Streams and tables are always fault tolerant because their data is stored reliably and durably in Kafka. This should be relatively easy to understand for streams by now as they map to Kafka topics in a straightforward manner. If something breaks while processing a stream, then we just need to re-read the underlying topic again.
For tables, it is more complex because they must maintain additional information—their state—to allow for stateful processing such as joins and aggregations like COUNT() or SUM(). To achieve this while also ensuring high processing performance, tables (through their state stores) are materialized on local disk within a Kafka Streams application instance or a ksqlDB server. But machines and containers can be lost, along with any locally stored data. How can we make tables fault tolerant, too?

This was a nice series.

Comments closed

Using Koalas on Azure Databricks

Published 2020-01-21 by Kevin Feasel

Ginger Grant shows how you can install the koalas library on an Azure Databricks cluster:

Unfortunately if you are using an ML workspace, this will not work and you will get the error message org.apache.spark.SparkException: Library utilities are not available on Databricks Runtime for Machine Learning. The Koalas github documentation says “In the future, we will package Koalas out-of-the-box in both the regular Databricks Runtime and Databricks Runtime for Machine Learning”. What this means is if you want to use it now
Most of the time I want to install on the whole cluster as I segment libraries by cluster. This way if I want those libraries I just connect to the cluster that has them. Now the easiest way to install a library is to open up a running Databricks cluster (start it if it is not running) then go to the Libraries tab at the top of the screen.

Click through for a demo of what you need to do.

Comments closed

Reporting on Power BI Report Usage

Published 2020-01-21 by Kevin Feasel

Gilbert Quevauvilliers looks at the outputs of Get-PowerBIActivityEvent and visualizes them for us:

I have been working on implementing changes with the new PowerShell script Get-PowerBIActivityEvent from the Power BI team.
One of the awesome things is that I noticed that there are some additional fields in the data extract. This then allowed me to be able to provide additional reporting insights. For example, which reports, and dashboards are consumed via an App!

If you’re in a Power BI-heavy shop, this could be quite useful information.

Comments closed

Bug with UnmatchedIndexes and Filtered Indexes

Published 2020-01-21 by Kevin Feasel

Taiob Ali points out a bug in SQL Server Management Studio:

SQL Server Management Studio (SSMS) showplan root node properties have two attributes to indicate when a filtered index can not be used during a parameterized query. One is ‘UnmatchedIndexes’, and the other one is under ‘Warnings’ attribute. Due to a bug in SSMS, the attributes are also showing positive results with filtered index and non-parameterized queries.
Let’s look at an example. I am using the AdventureWorks database, which you download from here.

If you want this fixed, vote up this Feedback item.

Comments closed

Windows Server Core Commands

Published 2020-01-21 by Kevin Feasel

Kenneth Fisher takes us through some helpful commands when working with Windows Server (particularly Core edition):

I don’t know how many of you are working with Windows Core these days but personally I think it’s a pretty cool concept. You aren’t supposed to be logging into your servers all that often so why have the extra overhead of Windows? Windows Core removes all of that and comes back with something that looks and feels remarkably like DOS from when I started with computers. That said, most of us, myself included, aren’t used to just having DOS commands any more so here are some helpful tips. I should point out that there is actually more than just DOS commands available and I’ll go over a bit of that here as well.

One other command I find quite helpful is sconfig, which pops up the Server Configuration app. That’s how you can, for example install Windows updates if you don’t have WSUS.

Comments closed

User-Defined Performance Counters in Perfmon

Published 2020-01-21 by Kevin Feasel

Michael J Swart shows how you can create a Perfmon counter which tracks a user-defined value in SQL Server:

Use this to monitor something that’s not already exposed as a performance counter. Like the progress of a custom task or whatever. If you can write a quick query, you can expose it to a counter that can be plotted by Performance Monitor.

This might track queue length, number of items processed, or whatever other time series measure you need but don’t have a UI to display.

Comments closed

Using ACLs to Secure Azure Data Lake Data

Published 2020-01-21 by Kevin Feasel

Matthew Roche takes us through access control lists (ACLs) in Azure Data Lake Storage Gen2 and how they apply to Power BI:

Earlier this week I received a question from a customer on how to get Power BI to work with data in ADLSg2 that is secured using ACLs. I didn’t know the answer, but I knew who would know, and I looped in Ben Sack from the dataflows team.Ben answered the customer’s questions and unblocked their efforts, and he said that I could turn them into a blog post. Thank you, Ben!

Read on for the answer.

Comments closed

Fun with SET Options

Published 2020-01-21 by Kevin Feasel

Dan Guzman takes us through different SET options in T-SQL and where you can go wrong:

ANSI_PADDING OFF has also been deprecated for quite some time and the SQL Server documentation specifically calls out “ANSI_PADDING should always be set to on.” In summary, a column-level ANSI_PADDING OFF setting causes nullable fixed-length char(n) and binary(n) columns to behave like variable-length varchar(n) and varbinary(n) columns. Furthermore, SQL Server automatically trims trailing blank characters from character data and leading binary zeros from binary data and stores the values as variable length instead of storing the provided value as-is during inserts and updates. Varchar(n)/varbinary(n) columns with ANSI_PADDING OFF are similarly trimmed. Note that it is the persisted ANSI_NULLS column meta-data setting that determines the storage and trimming behavior, not the current session ANSI_PADDING setting. The session ANSI_PADDING must still be ON when using features that require proper settings.

Some of these will pop up in occasional errors, like if you’re using filtered indexes or indexed views.

Comments closed

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Day: January 21, 2020