Press "Enter" to skip to content

Day: May 27, 2021

Securing Databricks on AWS

Andrew Weaver, et al, take us through security practices for running Databricks on AWS:

In this article, we will share a list of cloud security features and capabilities that an enterprise data team can use to harden their Databricks environment on AWS as per their risk profile and governance policy. For more information about how Databricks runs on Amazon Web Services (AWS), view the AWS web page and Databricks security on AWS page for more specific details on security and compliance.

Click through for that list.

Comments closed

Error Handling Patterns in Kafka

Gerardo Villeda gives a few options for handling errors in an Apache Kafka topic:

Apache Kafka® applications run in a distributed manner across multiple containers or machines. And in the world of distributed systems, what can go wrong often goes wrong. This blog post covers different ways to handle errors and retries in your event streaming applications. The nature of your process determines the patterns, and more importantly, your business requirements.

This blog provides a quick guide on some of those patterns and expands on a common and specific use case where events need to be retried following their original order. This blog post illustrates a scenario of an application that consumes events from one topic, transforms those events, and produces an output to a target topic, covering different approaches as they gradually increase in complexity.

Click through for the list. Each explanation is pretty short, but opens the door for further analysis.

Comments closed

Comparing Datasets in R

The folks at finnstats take us through a package to compare datasets in R:

How to find dataset differences in R, when the pieces of information are changing between datasets it’s a difficult task to identify the same.

Here we are going to discuss the daff package in R, daff package helps us to identify the differences and visualize them in a beautiful way.

Click through for the demonstration, including a video. H/T R-Bloggers

Comments closed

Handling Disaster Recovery

Randolph West has a disaster recovery plan:

I’ve had several occasions where hard drives have failed and attempts to recover data from these wonders of mechanical engineering have been mostly fruitless. I’ve experienced profound examples of data loss, in both cases losing years of email and contact details for people I met online.

This is all to say that I care deeply about data loss, and I take it personally when I’m asked to engage with potential customers to recover data in SQL Server.

This post is a high-level overview of how I tackle data recovery, whether personally or for professional consulting reasons.

Click through for the steps.

Comments closed