2021-05-27 – Curated SQL

Securing Databricks on AWS

Published 2021-05-27 by Kevin Feasel

Andrew Weaver, et al, take us through security practices for running Databricks on AWS:

In this article, we will share a list of cloud security features and capabilities that an enterprise data team can use to harden their Databricks environment on AWS as per their risk profile and governance policy. For more information about how Databricks runs on Amazon Web Services (AWS), view the AWS web page and Databricks security on AWS page for more specific details on security and compliance.

Click through for that list.

Comments closed

Error Handling Patterns in Kafka

Published 2021-05-27 by Kevin Feasel

Gerardo Villeda gives a few options for handling errors in an Apache Kafka topic:

Apache Kafka^® applications run in a distributed manner across multiple containers or machines. And in the world of distributed systems, what can go wrong often goes wrong. This blog post covers different ways to handle errors and retries in your event streaming applications. The nature of your process determines the patterns, and more importantly, your business requirements.
This blog provides a quick guide on some of those patterns and expands on a common and specific use case where events need to be retried following their original order. This blog post illustrates a scenario of an application that consumes events from one topic, transforms those events, and produces an output to a target topic, covering different approaches as they gradually increase in complexity.

Click through for the list. Each explanation is pretty short, but opens the door for further analysis.

Comments closed

Comparing Datasets in R

Published 2021-05-27 by Kevin Feasel

The folks at finnstats take us through a package to compare datasets in R:

How to find dataset differences in R, when the pieces of information are changing between datasets it’s a difficult task to identify the same.
Here we are going to discuss the daff package in R, daff package helps us to identify the differences and visualize them in a beautiful way.

Click through for the demonstration, including a video. H/T R-Bloggers

Comments closed

Anomaly Detection in Power BI

Published 2021-05-27 by Kevin Feasel

Patrick LeBlanc is looking for oddities:

Looking for ANOMALIES in Power BI? Justyna Lucznik walks Patrick through the new anomaly detection in Power BI and gives a SNEAK PEAK at what’s coming!

Click through for the video and demonstration.

Comments closed

Handling Disaster Recovery

Published 2021-05-27 by Kevin Feasel

Randolph West has a disaster recovery plan:

I’ve had several occasions where hard drives have failed and attempts to recover data from these wonders of mechanical engineering have been mostly fruitless. I’ve experienced profound examples of data loss, in both cases losing years of email and contact details for people I met online.
This is all to say that I care deeply about data loss, and I take it personally when I’m asked to engage with potential customers to recover data in SQL Server.
This post is a high-level overview of how I tackle data recovery, whether personally or for professional consulting reasons.

Click through for the steps.

Comments closed

M	T	W	T	F	S	S
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30
31

Day: May 27, 2021

Securing Databricks on AWS

Error Handling Patterns in Kafka

Comparing Datasets in R

Anomaly Detection in Power BI

Handling Disaster Recovery