Press "Enter" to skip to content

Day: September 17, 2021

Where Kafka Connect Fits

Shivani Sarthi explains the value of Kafka Connect:

Kafka connect is not just a free, open source component of Apache Kafka. But it also works as a centralised data hub for simple data integration between databases, key-value stores etc. The fundamental components include-

– Connectors

– Tasks

– Workers

– Converters

– Transforms

– Dead letter Queue

Moreover it is a framework to stream data in and out of Apache Kafka. In addition, the confluent platform comes with many built-in connectors,used for streaming data to and from different data sources.

Click through for information on each component.

Comments closed

Analog and Digital Clocks in R

Tomaz Kastrun reminds me of xclock:

It is all about measuring time using useless clocks. Script takes a system time and displays any given clock in a rather “static” way. You can choose between analog, small digital and big digital clock. And when playing with the time, you can also learn something new.

Click through to see how to make an analog clock plot in R, and then try it again with a digital clock.

Comments closed

Creating a Distributed Availability Group in Azure via Terraform

Sandeep Arora has some scripts for us:

To create a distributed availability group, you need two availability groups (AG) each with its own listener, which you then combine.In this case, one availability group is on-premises and the other needs to be created in Microsoft Azure. This example doesn’t cover all of the details like creating an extended network setup between on-premises network and Azure or joining Azure active directory domain services to and on-premises forest; instead, it highlights the key requirements for setting up the availability group in Azure and then configuring the distributed AG between the on-premises availability group (represented as AOAG-1) and the Azure availability group (represented as AOAG-2).

Click through for the preparations you need in place and a set of scripts to do the work.

Comments closed

Slot Machine Company Data Breach

Jonathan Greig reports on a data breach:

Nevada Restaurant Services (NRS), the owner of popular slot machine parlor chain Dotty’s, has disclosed a data breach that exposed a significant amount of personal and financial information. 

In a statement, the company confirmed that “certain customers” were affected by the breach and explained that the information includes Social Security numbers, driver’s license numbers or state ID numbers, passport numbers, financial account and routing numbers, health insurance information, treatment information, biometric data, medical records, taxpayer identification numbers and credit card numbers and expiration dates.

I don’t normally link to data breaches too often because if I did, this site would be renamed to Curated Data Breaches given how frequently they occur. But what I want to know is, why in the world does a slot machine company have passport numbers, health insurance information, and medical records? What are they doing with that information? Slot machines are pretty simple: put quarter in, watch the screen light up and speakers make a bunch of happy noises, repeat until you run out of quarters. Unless there’s some sort of business arrangement where they put slot machines in the Nevada hospitals…

Also, the fact that credit card numbers and expiration dates were lost makes me wonder if they were actually PCI compliant.

1 Comment

Azure Monitor Logs in Azure Data Studio

Julie Koesmarno has a new extension for us:

The Azure Monitor Logs extension in Azure Data Studio is now available in preview. The extension is supported in Azure Data Studio August 2021 release, v1.32.0.

Administrators can enable platform logging and metrics to one of their Azure services such as Azure SQL and set the destination to Log Analytics workspace. By installing native Azure Monitor Logs extension in Azure Data Studio, users can connect, browse, and query against Log Analytics workspace. Data professionals who are using Azure SQL, Azure PostgreSQL, or Azure Data Explorer in Azure Data Studio can access the data in the Log Analytics workspace for diagnosis or auditing in that same development environment. This native Azure Monitor Logs extension allows Azure service customers to also author notebooks with Log Analytics kernel, all equipped with Intellisense.

Click through for examples of how to use this.

Comments closed

Data Personas and Data Platform Rights

Craig Porteous wants us thinking about roles and permissions:

There are a great number of factors that contribute to an organisation’s data maturity, both technical and non-technical. The non-technical factors often have the biggest impact however. Such as how open to change the business’s upper management is, how much data is embraced by department and team leaders, and the training and support provided to utilise new technologies. All of these factors set the expectation and appetite for change within the business much more than the role out of a new product or technology.

Data Personas are one such area that contribute greatly towards Data Maturity as they define responsibility and access beyond the roles and job titles of team members. Individual team members may fit multiple personas or none at all. There are five core Data Personas that need to be established within an organisation for effective data governance and management with some additional personas on the periphery that can map a bit more to specific technical roles. The number of personas will vary depending on the maturity of the organisation’s data platform and their use of data but the core personas are relevant to all organisations.

Click through for an example set of personas and what kinds of rights they would need, broken down in a matrix of environment and data layer.

Comments closed

Deleting Duplicate Records

Chad Callihan shows one of the best ways to remove duplicate records from a table:

Have you ever needed to delete duplicate records from a table? If there is only one duplicate in a table then simply running something like DELETE FROM Table WHERE ID = @DupRecord will do the trick. What happens when the problem is found after multiple records are duplicated? Will tracking them all down be more time consuming? Here are a few different options for quickly clearing out duplicate records.

There’s the best way, and then there are the other ways.

Comments closed