Press "Enter" to skip to content

Month: August 2020

Domain Models: Purity vs Completeness

Vladimir Khorikov reflects on domain modeling:

This is an example of a rich domain model: all business rules (also known as domain logic) are located in the domain classes. There’s one such rule currently — that we can only assign to the user an email that belongs to the corporate domain of that user’s company. There’s no way for the client code to bypass this invariant — a hallmark of a highly encapsulated domain model.

We can also say that our domain model is completeA complete domain model is a model that contains all the application’s domain logic. In other words, there’s no domain logic fragmentation.

Domain logic fragmentation is when the domain logic resides in layers other than the domain layer. In our example, the UserController (which belongs to the application services layer) doesn’t contain any such logic, it serves solely as a coordinator between the domain layer and the database.

Domain modeling doesn’t land on too many database administrators’ doorsteps, but I enjoyed the article.

Comments closed

Multi-Armed Bandit Problems

Brian Amadio takes us through one of my favorite classes of problem:

Multi-armed bandits have become a popular alternative to traditional A/B testing for online experimentation at Stitch Fix. We’ve recently decided to extend our experimentation platform to include multi-armed bandits as a first-class feature. This post gives an overview of our experimentation platform architecture, explains some of the theory behind multi-armed bandits, and finally shows how we incorporate them into our platform.

The post gives a good explanation of the concept, as well as the implementation strategy.

Comments closed

Historical Dimensions in a Kimball-Style Model

Vince Iacoboni takes a stab at improving the Kimball model:

We owe a lot to Ralph Kimball and friends. His practical warehouse design and conformed-dimension bus architecture are the industry standard. Business users can understand and query these warehouses directly and gain valuable insights into the business. Kimball’s practical approach focuses squarely on clarity and ease of use for the business users of the warehouse. Kudos to you and yours, Mr. Kimball.

That said, can the mainstay Type 2 slowly changing dimension be improved? I here present the concept of historical dimensions as a way to solve some issues with the basic Type 2 slowly changing dimension promoted by Kimball. As we will see, clearly distinguishing between current and past dimension values pays off in clarity of design, flexibility of presentation, and ease of ETL maintenance.

As I was reading this, I was thinking “This sounds like a type 4 SCD” and Vince walks us through the differences between the two ideas. I’m not absolutely sold on the idea, but it is certainly interesting.

Comments closed

Fixing Availability Group Issues with Alerts

Wayne Sheffield automates a few problems away:

The first of the Availability Group issues to discuss is that, for whatever reason, data is no longer moving between the primary replica and a secondary replica. This puts the Data Movement in a Suspended state.

If the data movement remains suspended for too long, you might have to take some undesired actions to get things back in sync. Things like removing the database from the AG, restoring log files, then reattaching it to the AG. When the data movement becomes suspended, we want to get it flowing again as soon as possible. Let’s have SQL Server try to get the data flowing again.

Read on for more, including a second issues that Wayne helps solve.

Comments closed

Treemaps and Tables in Power BI

Ben Richardson looks at a couple of Power BI visuals:

In this article, you will learn how to work with Treemaps and Tables, which are two of the most commonly used Power BI visuals. You will also see how slicers can be used in Power BI to dynamically update the data in Treemaps and Tables. Power BI Visuals are extremely easy to create and don’t require you to write any code.

I like treemaps more than I probably should. They have a very limited set of good uses but I just can’t quit them.

Comments closed

Raw Data in the Data Lake

Steve Cardella uses wrestling as a metaphor where I would have used sewage:

Raw. Unfiltered. Data. The raw zone – it’s the dark underbelly of your data lake, where anything can happen. The CRM data just body-slammed the accounting data, while the HR data is taking a chair to the marketing data. It’s all a rumble for the championship belt, right? Oh, wait – we’re talking data lakes. Sorry. If the raw zone isn’t where data goes to duke it out, then what is the raw zone of a data lake? How should it be set up?

First, let’s take a time-out to give some context. A data lake is a central storage pool for enterprise data; we pour information into it from all kinds of sources. Those sources might include anything from databases to raw audio and video footage, in unstructured, semi-structured, and structured formats. A data warehouse, conversely, only houses structured data. The data lake is divided into one or more zones of data, with varying degrees of transformation and cleanliness (see this video for more: Data Lake Zones, Topology, and Security). The raw zone is the foundation upon which all other data lake zones are built.

Read on to understand the importance of raw data in a data lake, and the equal importance of making sure end users don’t see that stuff very often. Also, Steve gets bonus points for using my favorite term for the Aristotelian opposite of a data lake: the data swamp.

Comments closed

Kafka Integration with Knime

Swantika Gupta shows off some of Knime’s ability to integrate with Apache Kafka:

Knime Analytics Platform provides it’s users a way to consume messages from Apache Kafka and publish the transformed results back to Kafka. This allows the users to integrate their knime workflows easily with a distributed streaming pub-sub mechanism.

With Knime 3.6 +, the users get a Kafka extension with three new nodes:
1. Kafka Connector
2. Kafka Consumer
3. Kafka Producer

Click through to see how to configure each and how to enrich your data with Knime.

Comments closed

HIVE-6384 Errors with Spark and Parquet

Manoj Pandey troubleshoots an issue:

But I was getting following error:

warning: there was one feature warning; re-run with -feature for details
java.lang.UnsupportedOperationException: Parquet does not support decimal. See HIVE-6384

 
As per the above error it relates to some Hive version conflict, so I tried checking the Hive version by running below command and found that it is pointing to an old version (0.13.0). This version of Hive metastore did not support the BINARY datatypes for parquet formatted files.

Read on to see how Manoj was able to fix the problem in Azure Databricks.

Comments closed

Using Docker Desktop on WSL2

Chris Taylor walks us through updating Docker Desktop for Windows to support Windows Subsystem for Linux 2:

I won’t go too much into what this is as you can read the article in the links above but to summarise, this will improve the experience of docker on windows:

– Improvements in resource consumption
– Starting up docker daemon is significantly quicker (Docker says 10s as opposed to ~1min previously)
– Avoid having to maintain both Linux and Windows build scripts
– Improvements to file system sharing and boot time
– Allows access to some cool new features for Docker Desktop users.

Some of these are improvements we’ve been crying out for over the last couple of years so in my opinion, they’re a very welcome addition.

In order to get started using WSL2, there’s a couple of steps you need to run through which I’ll try and show below with a few screen shots.

Read on for the process.

Comments closed

Restoring SQL Server Backups from Azure Blob Storage

Niko Neugebauer walks us through special considerations when using Azure Blob Storage as your backup location:

If you are using Azure Blob Storage for SQL Server Backups, you need to know a couple of important details before you start with some significant project and as you should know (and in my head I am keep on hearing Grant Fritchey angrily declaring that there is no backup strategy that exists, if there is no restore strategy to be found in the plan).

The ACL permissions required by the Restore From URL operation in SQL Server (any SQL Server right now, starting with SQL Server 2012 page blobs and including SQL Server 2019 blob storage support that was started with SQL Server 2014) will require … [drumroll] … exclusive WRITE-permissions on the de underlying file(s).

Niko explains some of the pain around that requirement, as well as a few other bees in your bonnet.

Comments closed