Press "Enter" to skip to content

Day: April 10, 2020

Is Kafka a Database?

Kai Wähner asks a question I hadn’t thought about:

Can and should Apache Kafka replace a database? How long can and should I store data in Kafka? How can I query and process data in Kafka? These are common questions that come up more and more. Short answers like “Yes” or “It depends” are not good enough for you? Then this read is for you! This blog post explains the idea behind databases and different features like storage, queries, and transactions to evaluate when Kafka is a good fit and when it is not.

This is an interesting review of the Kafka ecosystem and shows that Apache Kafka really does blur the lines regarding what is a database.

Comments closed

Database Administration in Cloudera Data Platform

Gokul Kamaraj and Liliana Kadar walk through tools for the DBA in Cloudera Data Platform:

You can use Cloudera Manager to automate the process of upgrading the operational database in your Cloudera Data Platform-Data Center (CDP-DC). Upgrades are provided through releases or maintenance patches. Cloudera Manager installs the releases and/or patches and manages the configuration as well as the restart process.

If you are using CDP on a public cloud such as Amazon AWS, you have to create a new Data hub cluster to upgrade to the new versions of various components.  For more information about creating a new operational database Data hub cluster, see Getting Started with Operational Database on CDP

Cloudera’s offering is a cluster-based offering; upgrades and patches all span multiple nodes (servers) and installation, configuration, reboot are all automated, including rolling reboots where applicable.

Click through for a walkthrough of other tools for Hadoop DBAs.

Comments closed

Understanding Area Graphs

Mike Cisneros takes us through the proper usage of area graphs:

Area graphs can be effective for:

– Showing the rise and fall of various data series over time
– Conveying total amounts over time as well as some sub-categorical breakdowns (but only to a point)
– Emphasizing a part-to-whole relationship over time when one part is very large, or changes from being very large to very small
– Showing change over time in individual panels of a small multiple chart

Area graphs are not the ideal choice for:

– Data sets on scales that do not have a meaningful relationship to zero
– Showing several volatile data sets over time
– Showing fine differences in values

In this post, we’ll talk about how an area graph works, and some of the challenges to keep in mind when you are considering creating one.

Click through for a detailed analysis. I will rarely use area graphs, but in the right use case, they can add a strong visual dynamic to a report.

Comments closed

Metadata Integrity Checks in ADF.ProcFwk

Paul Andrew has another update to the ADF metadata-driven processing framework:

With this release of the framework I wanted to take the opportunity to harden the database and add some more integrity (intelligence) to the metadata, things that go beyond the existing database PK/FK constraints. After all, this metadata drives everything that Azure Data Factory does/is about to do – so it needs to be correct. These new integrity checks take two main forms:

1. Establishing a minimum set of criteria within the metadata before the core Data Factory processing starts and creates an execution run.
2. Establishing a logical chain of pipeline dependencies across processing stages. Then providing a set of advisory checks for area’s of conflict and/or improvement.

More details on both are included against the actual stored procedure in the database changes section below.

In addition to database hardening, I’ve added a few other bits to the solution, including a PowerShell script for ADF deployments and a Data Studio Notebook to make the developer experience of implementing this code project a little nicer.

Read on to see what’s in version 1.3. Check it out on GitHub as well.

Comments closed

Power BI: The Key Didn’t Match Any Rows in the Table

Chris Webb troubleshoots an issue:

One of the most common errors you’ll see when working with Power Query in Power BI or Excel is this:

Expression.Error: The key didn’t match any rows in the table

It can occur with almost any data source and in a wide variety of different circumstances, and for new users of Power Query it can be very confusing. In this post I’ll explain what the error message means and when you’re likely to encounter it using a simple example.

TL;DR You’re probably getting this error because your Power Query query is trying to connect to a table or worksheet or something in your data source that has been deleted or been renamed.

Read on to understand exactly what it means and how you can fix your code if you get this error.

Comments closed

MERGE in Many Languages

Lukas Eder takes a look at the MERGE statement in SQL:

A few dialects support MERGE. Among the ones that jOOQ 3.13 supports, there are at least:

– Db2
– Derby
– Firebird
– H2
– HSQLDB
– Oracle
– SQL Server
– Sybase SQL Anywhere
– Teradata
– Vertica

For once, regrettably, this list does not include PostgreSQL. But even the dialects in this list do not all agree on what MERGE really is. The SQL standard specifies 3 features, each one optional:

– F312 MERGE statement
– F313 Enhanced MERGE statement
– F314 MERGE statement with DELETE branch

But instead of looking at the standards and what they require, let’s look at what the dialects offer, and how it can be emulated if something is not available.

This is a really cool overview of an area where several vendors can claim support, but that support can mean quite different things. The one caveat is, I don’t know if any of the other platforms’ MERGE operators are as busted as SQL Server’s in terms of bugs.

Comments closed

Checking Login Usage

Kenneth Fisher checks a box I really like checking:

I get asked this every now and again, along with the companion When was the last time this login was used? It’s a pretty easy question to answer but there are some caveats. First of all you need to have your system set to log both successful and failed logins. You can probably get away with successful only but personally I want to know a failed attempt just like I’d want to know a successful one.

This is a thing that we tend to avoid because of how many events it adds to the Security event log, but is critical in understanding whether that person trying to log in as sa gave up or stopped due to a successful login.

1 Comment