Press "Enter" to skip to content

Month: October 2022

An Introduction to Event Sourcing

Aasif Ali provides a high-level introduction to the concept of event sourcing:

Event sourcing is a way to store data as events in an append-only log. It only keeps the latest version of the entity state. This method stores the state of a database object as a sequence of events. It is essentially a new event each time the object changed state, from the beginning of the object’s existence. An event can be anything that is generated by a user, a mouse click, a key press on a keyboard, and so on. It is a great way to atomically update the state and publish events. Not just can we query these events, but we can also use the event log to reconstruct past states, and as a foundation to automatically adjust the state to cope with retroactive changes.

Events are immutable, they cannot be changed. This well-known rule of event stores is often the first defining characteristic of event stores and event sourcing.

Read on to see how this concept works and how products like Apache Kafka make event sourcing viable.

Comments closed

Choosing between Synapse Spark Notebooks or Job Definitions

Arun Sethia and Arshad Ali explain when you might use a Spark notebook versus a job definition:

Synapse Spark Notebook is a web-based (HTTP/HTTPS) interactive interface to create files that contain live code, narrative text, and visualizes output with rich libraries for spark based applications. Data engineers can collaborate, schedule, run, and test their spark application code using Notebooks. Notebooks are a good place to validate ideas and do quick experiments to get insight into the data. You can integrate the Synapse Notebook into Synapse pipeline.

The Notebook allows you to combine programming code with markdown text and perform simple visualizations (using Synapse Notebook chart options and open-source libraries). In addition, running code will supply immediate feedback, output, and progress tracking within Notebook.

Click through for the comparison.

Comments closed

OpenSSL Patch incoming

Steven Vaughan-Nichols has bad news for us:

So we should all be concerned that Mark Cox, a Red Hat Distinguished Software Engineer and the Apache Software Foundation (ASF)’s VP of Security, this week tweeted, “OpenSSL 3.0.7 update to fix Critical CVE out next Tuesday 1300-1700UTC.”

How bad is “Critical”? According to OpenSSL, an issue of critical severity affects common configurations and is also likely exploitable. 

There isn’t enough detail yet to know exactly what the issue is. It’s forthcoming, however, so time to get those patch windows ready.

Comments closed

Debugging Stream Table Joins

Philip Schmitt dives in to a problem:

Joining two topics to aggregate the data is one of the fundamental operations in stream processing. But that’s not to say that it’s simple. Let me show you what can go wrong! This article chronicles my journey to join two Apache Kafka topics—stumbling into and out of various pitfalls. I‘m going to show you…

– How to debug co-partitioning with kcat (formerly kafkacat)

– How to avoid the number one pitfall of using kcat

– Stream–table join semantics in action

There’s a lot of useful information in this post.

Comments closed

Useful Add-On Packages for Shiny

Mandy Norrbo has a list:

There are a growing number of Shiny users across the world, and with many users comes an increasing number of open-source “add-on” packages that extend the functionality of Shiny, both in terms of the front end and the back end of an app.

This blog will highlight 5 UI add-on packages that can massively improve your user experience and also just add a bit of flair to your app. Each package will have an associated example app (some more inspired than others) that I’ve created where you can actually see the UI component in action. All code for example apps can be found on our GitHub.

Click through for the list, as well as examples of how they work.

Comments closed

Search Optimization in Snowflake

Arun Sirpal doesn’t have time to create indexes:

I will use a clone of the table to compare it to when search optimisation is on. I will make sure no caching in on which could affect the test.
I activate the feature via:

ALTER TABLE data_staging ADD SEARCH OPTIMIZATION;

This takes time! If you run something like the below to confirm 100% completion. This is because there is a maintenance service that runs in the background responsible for creating and maintaining the search access path:

Click through to see what happens and the kinds of performance gains Arun realized.

Comments closed

Feature Branching for Database Projects

Olivier Van Steenlandt describes one branching strategy and applies it to database development:

Depending on how you have defined your branching strategy, you will start development differently. Below I’m defining a few different branching strategies:

1. No branching

2. Branching/environment

3. Branching/feature

4. …

In the past, I have used all of the above. I need to tell you that the Branching/feature strategy allows me to be the most flexible for database development. Why? Let’s dive into this method for now:

Read on to learn more.

Comments closed

Approximate Percentiles in SQL DB and SQL MI

Balmukund Lakhani has an announcement:

Approximate query processing was introduced to enable operations across large data sets where responsiveness is more critical than absolute precision. Approximate operations can be used effectively for scenarios such as KPI and telemetry dashboards, data science exploration, anomaly detection, and big data analysis and visualization. Approximate query processing family has enabled a new market of big data HTAP customer scenarios, including fast-performing dashboard and data science exploration requirements.  

Today, we are announcing preview of native implementation of APPROX_PERCENTILE in Azure SQL Database and Azure SQL Managed Instance. This function will calculate the approximated value at a provided percentile from a distribution of numeric values.

This is way faster than using the PERCENTILE_CONT() or PERCENTILE_DISC() window functions. For a decent-sized query, I was getting anywhere from 5-20x performance improvements, and the larger the dataset, the bigger the gains. It is important to note that the approximate percentiles are not window functions, so you don’t get one row back per row of input.

Comments closed

Improving Power BI Q&A with Synonyms

Patrick LeBlanc pulls out the thesaurus:

Most struggle with getting Q&A to be effective in Power BI. Usually this comes down to either model naming or synonyms. Patrick shows you how you can update these and also a nice feature to let you share them with others.

I’ve found the Power BI Q&A component to be a bit tetchy, even with synonyms, when you’re asking for non-trivial slices of the data. Still, what Patrick shows does help a lot.

Comments closed

Storage Snapshots and In-Memory OLTP

Andy Yun answers a question:

Can I still take storage-array snapshots and if yes, will I lose data in my memory-optimized tables? What about data inside my non-durable tables?

Thankfully, the question was not in the headline. Therefore, Betteridge’s Law of Headlines does not apply and the answer may be either ‘yes’ or ‘no’ depending on the facts. Speaking of which, to find that answer, click through and read Andy’s post.

Comments closed