Press "Enter" to skip to content

Day: January 11, 2022

Streaming Data to Event Hubs via Kafka Connect and Debezium

Niels Berglund starts off a two-part sub-series within a series:

This post is the first of two looking at if and how we can stream data to Event Hubs from Debezium. Initially I had planned only one post covering this, but it turned out that the post would be too long, so therefore I split it in two.

It started with the post, How to Use Kafka Client with Azure Event Hubs. In that post, I looked at how the Kafka client can publish messages to – not only – Apache Kafka but also Azure Event Hubs. In the post, I said something like:

An interesting point here is that it is not only your Kafka applications that can publish to Event Hubs but any application that uses Kafka Client 1.0+, like Kafka Connect connectors!

Click through for the first part of this pairing.

Comments closed

Trying Automated ML in Azure ML

I continue a series on low-code machine learning with Azure ML:

Automated Machine Learning (AutoML) provides two distinct benefits. The first benefit is the one that AutoML providers tend to tout: you don’t need (much) machine learning experience to use them. According to the marketing, AutoML does all of the work and you sit back and enjoy the fruits of its labor.

I am nowhere near sold on this use case for AutoML. Yes, you can get answers in a few clicks, but to get good answers, you need a lot more knowledge of data processing and statistics than they let on. Feeding in garbage data will get you mediocre results.

Click through for the second benefit, which I think applies much better. Also for a step-by-step demonstration of how AutoML works.

Comments closed

Build a Sandbox for Testing PolyBase and Hadoop

Fernando Sibaja Araya has a step-by-step guide to building a Hadoop sandbox for testing PolyBase on SQL Server:

This guide will take you step by step into deploying a hadoop sandbox into Azure. You then will connect to the sandbox through SSH and tunnel all the required ports to your machine so you can access all the endpoints to execute hadoop queries from Polybase.

We will be deploying Hortonworks Data Platform Sandbox 2.6.4. This will be 1 VM running in azure and within this VM a docker container will have all the HDP services running.

Click through for the full set of instructions. I’m a little overjoyed that my blog snuck into the set of links and resources at the end.

Comments closed

Using Azure DevOps to Deploy Python Functions to Azure Function Apps

Rayis Imayev has a trick question for us:

Can I create a CI/CD pipeline to deploy Python Function to Azure Function App using Windows self-hosted Azure DevOps agent?

My short answer to this question is Yes and NoYes, you can use Windows self-hosted Azure DevOps agent to deploy Python function to the Linux based Azure Function App; and, No, you can’t use Windows self-hosted Azure DevOps agent to build Python code since it will require collection/compilation/build of all Python-depended libraries on a Linux OS platform.

Click through for the full answer.

Comments closed

Preconceived Notions: “Databases Are Easy”

Rob Farley takes us back to school:

At university I studied Computer Science, which felt like it was mostly about algorithms and paradigms. It covered how to approach particular kinds of problems, what languages suited what problems and why, and how to model things. The answer to a lot of things was “C’, whether it was a multiple choice question, or the question about which language would be used to solve something.

I skipped the database subject. Everyone said it was overly basic, easy marks, and not particularly interesting. I wasn’t interested in it. Not when there were subjects like Machine Learning where we’d implement genetic algorithms in LISP to find ways to battle other algorithms in solving the prisoner’s dilemma. Or the subject where we’d create creatures (in software) that would follow each other in a flocking motion around a cityscape. Everything I heard about databases was that they were largely of no importance.

In fairness, university database classes tend to fall into one of two categories: either mathematical forays into set theory or fluffy, school-of-business-friendly “Today we’re going to learn how to write the word SELECT. Next time, we’ll learn how to write the word FROM” types of courses, at least from what I’ve experienced.

Comments closed

Secondary and Tertiary Data Mesh Interfaces in Azure

Paul Andrew continues a series on implementing data mesh with Azure:

When thinking about our node edges in part 2 I also made the statement about a primary set of node interfaces. In my initial drawings I alluded to this then capturing what I’ve called the PaaS Plane, suggesting the Azure Resource type used.

Building on this understanding I want to cover off the remaining edge use cases by exploring the other interface types we will typically need for the nodes of our data mesh architecture.

This has been a rather informative series on a topic I knew very little about coming in.

Comments closed

Finding Buffer Pool Distribution by Table

Guy Glantser has a script to track buffer pool size by table:

Sometimes we need to troubleshoot memory pressure issues in SQL Server or in Azure SQL. One of the things that can help in these cases is to view the contents of the buffer pool.

I wrote a simple script that displays the contents of the buffer pool in terms of tables in the current database. For each table, it presents the total table space and the space consumed by the table in the buffer pool. The script is based mainly on the sys.dm_os_buffer_descriptors dynamic management view.

Click through for the script.

Comments closed

Thoughts on Page Life Expectancy

Chad Callihan gives us a few high-level thoughts on page life expectancy and the buffer pool:

The buffer pool in SQL Server is an area in memory for caching data. Once data is read from disk, it can be kept in the buffer pool and SQL Server can check here for data when it needs to be found in the future. If requested data is not found in the buffer pool, a hard page fault can occur, meaning data needs retrieved from disk. It’s possible that data is found somewhere else still in memory which is called a soft page fault.

Click through for Chad’s thoughts on what a good page life expectancy looks like. My minor addition is that the number isn’t as important as the shape of the curve: if you have a fairly stable PLE above some arbitrary threshold (well above 300 seconds!), you’re probably in good shape. If your PLE sawtooths, your server’s RAM Pez dispenser needs refilled.

Comments closed