Curated SQL – Page 784 – A Fine Slice Of SQL Server

Feeding Data from Kafka into Splunk

Published 2021-07-22 by Kevin Feasel

Guy Shilo performs a bit of data migration:

Kafka connect is a framework that uses Kafka topics for collecting data from various sources and distributing it to different sinks. It comes bundled with Kafka installation but can run independently from Kafka brokers and access them remotely. Here is an explanation about what Kafka connect is and it’s architecture. It is also a good candidate for running on Kubernetes since it only uses outgoing communication.
The framework uses plugins to be able to talk to different sources and sinks. There are many ready plugins for a variety of systems. Some of them are free and some are licensed to companies like Confluent or Debezium. many of them can be found here. Some systems can be a source of data, some can be a sink and some can be both. Basically a source adapter polls the source system for changes, pulls the data and publish it in a Kafka topic. A sink adapter subscribes to a Kafka topic, gets incoming events and exports them into the target system.
As I mentioned, there are several dozens of supported adapters. Just for the demonstration we will capture events from kafka topic and store them in splunk for visualization and investigation.

Click through to see how it all fits together.

Comments closed

Querying Private Blob Storage Containers with Azure Synapse Analytics

Published 2021-07-22 by Kevin Feasel

Dennes Torres looks at some private information:

The queries from the previous article were made against the public container in the blob storage. However, if the container is private, you will need to authenticate with the container. In this article, you’ll learn how to query private blob storage with SQL.
NOTE: Be sure that the Azure Synapse Workspace and the storage account with the sample files are set up before following along with this article. You will also need to replace your storage account URL each time that a storage account URL is used in the article.
There are three possible authentication methods, and these methods may have some variation according to the type of storage account and the access configuration. I will not dig into details about storage here and leave that for a future article.

Read on for the three authorization methods and a lot of detail on using SAS tokens (the preferred method) to access this data.

Comments closed

Combining Change Data Capture with Azure Data Factory

Published 2021-07-22 by Kevin Feasel

Reitse Eskens continues a series on learning Azure Data Factory:

In my last blog, I pulled all the data from my table to my datalake storage. But, when data changes, I don’t want to perform a full load every time. Because it’s a lot of data, it takes time and somewhere down the line I’ll have to separate the changed rows from the identical ones. Instead of doing full loads every night or day or hour, I want to use a delta load. My pipeline should transfer only the new and changed rows. Very recently, Azure SQL DB finally added the option to enable Change Data Capture. This means after a full load, I can get the changed records only. And with changed records, it means the new ones, the updated ones and the deleted ones.
Let’s find out how that works.

Read on for the article and demonstration.

Comments closed

Differences in Logging between Azure Analysis Services and Power BI PPU

Published 2021-07-22 by Kevin Feasel

Gilbert Quevauvilliers continues a series on migrating from Azure Analysis Services to Power BI Premium Per User:

Another important aspect when having datasets is being able to log and monitor performance. In this blog post I am going to compare the logging between Azure Analysis Services (AAS) and Power BI Premium Per User (PPU).
With the recent release of PPU having integration with Log Analytics it makes it a lot easier to compare the logging options between AAS and PPU.

This is an area where there’s still a bit of a gap. Click through to see what the differences look like today.

Comments closed

Data Platform Deployments via Azure Test Plan

Published 2021-07-22 by Kevin Feasel

Kevin Chant shows off the power of Azure Test Plans:

In this post I want to cover using Azure Test Plans for Data Platform deployments. Because using it to manage test plans can be very useful.
By the end of this post, you will know what Azure Test Plans are and how they can be useful for data Platform deployments.

Click through to see how this feature in Azure DevOps works and how you can use it to test your deployments.

Comments closed

Content Sharing with Power BI

Published 2021-07-22 by Kevin Feasel

Marc Lelijveld continues a series on going from small-scale to enterprise with Power BI:

Let’s start with the most important feature of the Power BI Service, sharing content! At the same time, this can be one of the most challenging ones. Especially since there are many ways to share content in Power BI. In my experience in enterprise organizations, I have seen a various ways of sharing content. Below I explain the different options there are, leading to a conclusion of my personal best practice.
But why is the way how we share content so important in relation to large enterprise solutions? Well, I believe that all centrally managed solutions should match (organizational) best practices. The way how the content is made available to the users is one of these best practices. It will help end users to find the content they are looking for, always at the same consistent location.

Read on for several techniques.

Comments closed

Reinforcement Learning and Python 3

Published 2021-07-21 by Kevin Feasel

I have a new post up:

I finally got around to trying out a reinforcement learning exercise this weekend in an attempt to learn about the technique. One of the most interesting blog posts I read is Andrej Karpathy’s post on using reinforcement learning to play Pong on the Atari 2600. In it, Andrej uses the Gym package in Python to play the game.
This won’t be a post diving into the details of how reinforcement learning works; Andrej does that far better than I possibly could, so read the post. Instead, the purpose of this post is to provide a minor update to Andrej’s code to switch it from Python 2 to Python 3. In doing this, I went with the most convenient answer over a potentially better solution (e.g., switching xrange() to range() rather then re-working the code), but it does work. I also bumped up the learning rate a little bit to pick up the pace a bit.

Click through for the (slightly) updated code.

Comments closed

Two Ways to Access Kafka Topics from R

Published 2021-07-21 by Kevin Feasel

Patrick Neff shows us a couple of ways to build a Kafka-to-R pipeline:

In Data Science projects, we distinguish between descriptive analytics and statistical models running in production. Overall, these can be seen as one process. You start with analyzing historical data to gain insights, find correlations, and finally develop and optimize your model. Then you transfer it and use it in your running system. A key point for every data scientist is not just the mathematical skills themselves, but also how to get the data into your analytics program.
In this blog post, we focus exactly on this crucial step: retrieving the data. In a second article, we’ll talk about running your model on real-time data.

Click through for the techniques.

Comments closed

String Formatting in Powershell

Published 2021-07-21 by Kevin Feasel

Robert Cain continues a series on fun with Powershell and strings:

Specifically, this can control the output when we embed a numeric value inside a string. Passing in special formatting instructions will make it easy to display values with commas, as currency, or even as hexidecimal.
For all of the examples, we’ll display the code, then under it the result of our code. In this article I’ll be using PowerShell Core, 7.1.3, and VSCode. The examples should work in PowerShell 5.1 in the PowerShell IDE, although they’ve not been tested there.

Robert has quite a few examples, so check them out.

Comments closed

The Benefits of Kubernetes for App Hosting

Published 2021-07-21 by Kevin Feasel

Joy George Kunjikkur enumerates reasons why you might want to use Kubernetes to host applications:

I started writing this post 2-3 years back. Mainly when Apache Spark 2.3 started supporting Kubernetes (K8s) in 2018. It was obvious that Kubernetes is taking over app hosting space the same way virtual machines took over physical machines. All are expected to understand where the industry is moving and adopt. Hence I paused this post as there is nothing I need to endorse. But it’s time to resume this post and publish it.

Click through for a slew of thoughts on the topic.

Comments closed

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

Curated SQL Posts