Press "Enter" to skip to content

Day: March 28, 2022

Replacing Zookeeper in Kafka

Guozhang Wang explains the decision-making behind a major change in Apache Kafka:

Why replace ZooKeeper with an internal log for Apache Kafka® metadata management? This post explores the rationale behind the replacement, examines why a quorum-based consensus protocol like Raft was utilized and altered to become KRaft, and describes the new Quorum Controller built on top of KRaft protocols.

Click through for the reasoning, which includes a considerably faster shutdown in large environments..

Comments closed

Data Visualization in Python

Mehreen Saeed uses a few data visualization libraries in Python:

Data visualization is an important aspect of all AI and machine learning applications. You can gain key insights of your data through different graphical representations. In this tutorial, we’ll talk about a few options for data visualization in Python. We’ll use the MNIST dataset and the Tensorflow library for number crunching and data manipulation. To illustrate various methods for creating different types of graphs, we’ll use the Python’s graphing libraries namely matplotlib, Seaborn and Bokeh.

Bokeh results can look really nice, although it does feel like it requires a lot more developer time and effort to get it right. Click through for examples of each of the three libraries.

Comments closed

Azure ML Well-Architected Framework Review

Ben Brauer has good news:

Microsoft offers prescriptive guidance called the Well-Architected Framework that optimizes workloads implemented and deployed on Azure. This guidance has been generalized for most workloads and creates a basis for reliable and secure applications that are cost optimized.

We have begun to build on this base content set to include more precise guidance for specific workload types, such as machine learning, data services and analytics, IoT, SAP, mission critical apps, and web apps. Machine Learning was the first branch from the base content, which came into fruition in the Fall of 2021.

In case you have never used the Azure Well-Architected Review assessment tool, it’s really useful. It can take hours (or days) to go through the review but if you take it seriously and have the right people in the room giving answers, you’ll get concrete guidance on how to optimize your Azure-based solutions.

Comments closed

Using the Power BI Scanner

Gilbert Quevauvilliers sets scanners to On:

As mentioned in my previous blog post this is part 1 of the series where I am going to show you how to use the Power BI Scanner to get the App workspace data. I am also going to mention that the Power BI Scanner from PowerBI.Tips and Tommy Puglia (Twitter) has a wealth of other awesome information for your Power BI tenant.

Fortunately I do not have to go through all the steps on setting up and getting the Power BI Scanner data, you can do it by following the blog post already created with some amazing details here: Using the Power BI Scanner API to Manage Tenant’s Entire Metadata

Check out that article but Gilbert also has some nice tips.

Comments closed

Creating an Azure Redis Cache

Arun Sirpal continues a series on Azure Redis:

Remember – basic should never be used for production. Also, if you need dedicated service then you will not want C0 because this is based on shared infrastructure. Redis can get expensive but could be cost – effective especially if you design to use a multi app approach per cache.

I select P1 – Premium with 6GB cache just to talk a couple things through.

As a note, 6GB of cache is a lot in most environments. That’s because your average cached element size in Redis should be measured in single-digit or double-digit bytes, not kilobytes. You’re typically caching individual values, not entire documents, so if you average 64 bytes per cached key-value combo, you can get somewhere around 90 million values in cache at a time. The database call savings add up quickly, considering a really simplistic estimation: if the average number of queries before expiration for a cached item is 3, a single “cycle” of caching saves you about 270 million database calls. That can allow you to downscale your relational databases considerably, saving a lot of money in the process. There’s a lot of hand-waving I’m doing in the math and a lot of complexity I’m wiping away, but both of those tend on average to make the cache more effective, not less.

Comments closed

Thoughts on CI/CD in the Serverless SQL Pool

Kevin Chant answers a question:

I got asked if I thought we were going backwards with CI/CD for serverless SQL Pools. Mostly due to the fact that we had to include defensive logic in the SQL scripts.

My answer was that we did this purely because of the limitations of the SchemaVersions table. Because the DBOps PowerShell module usually creates this table in the target location if it does not exist. Which it then uses it to log which scripts have already run.

However, you cannot easily create and update this table in a serverless SQL Pool.

Read the whole thing. This sounds like a case in which having a central schema versioning location rather than using the destination system could be advantageous. Of course, now you have a dependency on that central schema versioning location, so life is full of trade-offs.

Comments closed

Time Zones and Extended Events

Tomas Zika answers a question:

I’ve helped answer another question that appeared on the SQL Server Slack:

Are timestamps in XE event files you view in SSMS local or server time?

To test this, I need a server in a different timezone than the client (SSMS). I find the quickest and most easy tool for that to be containers – more specifically, Docker.

Click through for the answer, as well as a few Docker-related incidentals.

Comments closed