Kafka Schema Registry Tips

Kevin Feasel

2019-05-30

Hadoop

Yeva Byzek shares 17 tips for managing your Kafka Schema Registry:

Mistake #5: Configuring different names for the schemas topic in different Schema Registry instances

There is a commit log with all the schema information, which gets written to a Kafka topic. All Schema Registry instances should be configured to use the same schemas topic, whose name is set by the configuration parameter kafkastore.topic. This topic is the schema’s source of truth, and the primary instances read the schemas from this topic. The name of this topic defaults to _schemas, but sometimes customers choose to rename it. This has to be the same for all Schema Registry instances, otherwise it may result in different schemas with the same ID.

Read on for sixteen more.

Removing Serial Correlation

Vincent Granville has an easy trick for removing serial correlation from a data set:

Here is a simple trick that can solve a lot of problems.

You can not trust a linear or logistic regression performed on data if the error term (residuals) are auto-correlated. There are different approaches to de-correlate the observations, but they usually involve introducing a new matrix to take care of the resulting bias. See for instance here.  

Click through for the alternative.

SQL Server Containers: MCR vs Docker Hub

Randolph West notes that Microsoft is no longer updating SQL Server images on Docker Hub:

In October 2018, Microsoft announced a change to the source of their Docker containers. You should be using the new Microsoft Container Registry (MCR) as the source for official Docker container images for Microsoft products.

While existing container images in the Docker Hub are not affected, you may not get updated images unless you switch.

On the Docker Hub website, there are instructions on hitting the MCR:

Start a mssql-server instance using the latest update for SQL Server 2017

docker run -e 'ACCEPT_EULA=Y' -e 'SA_PASSWORD=yourStrong(!)Password' -p 1433:1433 -d mcr.microsoft.com/mssql/server:2017-latest

T-SQL Tuesday 114 Roundup

Matthew McGiffen wraps up another T-SQL Tuesday:

Here’s my round-up for this month’s T-SQL Tuesday.

Thanks to everyone who contributed last week. It was great reading your posts and seeing the different ways you interpreted the puzzle theme.

We had real-life problems, we had SQL coding questions, we had puzzles, we had solutions, we had games, and we had the imaginarium.

Click through for thirteen blog posts.

Populating a Data Vault Model with Azure Data Factory

Rayis Imayev gives us an example of ELT into a Data Vault model using Azure Data Factory:

To make a full transition from the existing  DW model to an alternative Data Vault I removed all Surrogate Keys and other attributes that are only necessary to support Kimball data warehouse methodology. Also, I needed to add necessary Hash keys to all my Hub, Link and Satellite tables. The target environment for my Data Vault would be SQL Azure database and I decided to use a built-in crc32 function of the Mapping Data Flow to calculate hash keys (HK) of my business data sourcing keys and composite hash keys of satellite tables attributes (HDIFF).

Data Vault is somewhere on my list of things to learn. It’s not at the top of the list, but that’s not a slight against it.

Moving a Power BI Data Model to Tabular

Ginger Grant provides some tips on migrating from a Power BI data model to an Analysis Services Tabular model:

Unless you are upgrading to analysis services on SQL Server 2019, chances are you are going to have to review your DAX code and make some modifications as DAX on the other versions of SQL Server are not the same as Power BI. I was upgrading to AS on SQL Server 2016, there were some commands that I had to manual edit out of the JSON file. If you have any new DAX commands, take them out of your Power BI Model which means you will not have to manually edit the JSON file to remove them when the new commands are flagged as errors. Make sure your Power BI Model does not include commands such as SELECTEDVALUE, GENERATESERIES as well as all of the automatically generated date hierarchies. After your Power BI desktop file is clean, leave it running as you are going to need to have it running for the next step.

Click through for more details.

Auditing Azure Analysis Services

Kasper de Jonge shows how you can audit an Azure Analysis Services cube:

So the question was: how can I see who connected to my AS Azure database and what queries where send? Initially I thought of ways I used to do this in the on premises world. Capture profiler traces or XEvents by writing code and then store it somewhere for processing. It looks like was not alone in these, even the AS team itself had ways to capture XEvents and store them: https://azure.microsoft.com/en-us/blog/using-xevents-with-azure-analysis-services/

But it turns out it is much more smooth, simple and elegant by leveraging Azure’s own products. In this case we will be using Azure Log Analytics. It already documented in the official documentation here.

Click through for a demo.

Diagnosing Why SYSUTCDATETIME is Faster than SYSDATETIME

Joe Obbish is on a quest:

On my machine the code takes about 11.6 seconds to execute. Replacing SYSDATETIME() with SYSUTCDATETIME() makes the code take only 4.3 seconds to execute. Why is SYSUTCDATETIME() so much faster than SYSDATETIME()?

There’s an interesting answer to the question, so read on.

Categories

May 2019
MTWTFSS
« Apr Jun »
 12345
6789101112
13141516171819
20212223242526
2728293031