Press "Enter" to skip to content

Day: August 9, 2017

Event Sourcing And Kafka

Ben Stopford gives an architectural overview of how Apache Kafka can act as an event sourcing system:

At a high level, event sourcing is really just the observation that in an event driven architecture, the events are facts. So if you keep them around, you can use them as a datasource.

One subtlety comes from the way the events are modelled. They can be values: whole facts (an Order, in its entirety) or they can be a set of ‘deltas’ that must be re-combined (a whole Order message, followed by messages denoting just the state changes: “amount updated to $5”, “order cancelled” etc).

As an analogy, imagine you are building a version control system. When a user commits a file for the first time, you save it. Subsequent commits might only save the ‘delta’: just the lines that were added, changed or removed. Then, when the user performs a checkout, you open the version-0 file and apply all the deltas, to derive to the current state.

The alternate approach is to simply store the whole file, exactly as it was at the time it was changed. This makes pulling out a version quick and easy, but to compare different versions you would have to perform a ‘diff’.

The series Ben has been working through is very helpful in wrapping your mind around what Kafka can do, and this post was no exception.

Comments closed

Parallel Processing In R

Chaitanya Sagar shows a few methods for parallelizing code in R:

Parallel programming may seem a complex process at first but the amount of time saved after executing tasks in parallel makes it worth the try. Functions such as lapply() and sapply() are great alternatives to time consuming looping functions while parallel, foreach and doParallel packages are great starting points to running tasks in parallel. These parallel processes are based on functions and are also modular. However, with great power comes a risk of code crashes. Hence it is necessary to be careful and be aware of ways to control memory usage and error handling. It is not necessary to parallelize every piece of code that you write. You can always write sequential code and decide to parallelize the parts which take significant amounts of time. This will help in further reducing out of memory instances and writing robust and fast codes. The use of parallel programing method is growing and many packages now have parallel implementations available. With this article. one can dive deep into the world of parallel programming and make full use of the vast memory and processing power to generate output quickly. The full code for this article is as follows.

If you’re using Microsoft R server, there are additional parallelism options. H/T R-Bloggers

Comments closed

What’s New In Ambari 2.5

Paul Codding tells us what’s coming in the next version of Ambari:

Ambari Log Search (Tech Preview) has been one of our most popular features, and in this release has seen UI, and backend refreshes based on customer feedback.  Log Search is planned for GA with the next major Ambari release, Ambari 3.0 in which the UI will be simplified, and the backend will have more robust log retention and scaling capabilities.

There are some interesting changes, so read the whole thing.

Comments closed

Breaking A Database Into Smaller Files

Jana Sattainathan shows how to break a SQL Server database into smaller files, as well as giving some reasons why you might want to do that:

You are probably reading this post because you have experienced the pain yourself and I dont want to waste anymore of your time and get right to the steps involved in breaking up a huge database or a datafile

  1. Check the space situation on your host

  2. Get the space usage by files for the big database/datafile in question

  3. Decide on number of files to add/location

  4. Add multiple secondary datafiles

  5. Distribute data from big datafile into the new datafiles using EMPTYFILE option

  6. Shrink the big datafile and set a maximum size

  7. Change the default filegroup

Read on for Jana’s step-by-step approach.

Comments closed

Partial Text Slicer In Power BI

Mitchell Pearson shows how to slice results based on text input:

In this installment of the Problem, Design, Solution series we are going to show you how to perform a text search using slicers in Power BI, this simulates a “LIKE” type search. In the following screenshot you can see that when “Tax” is selected all records in the table that have “Tax” anywhere in the record are returned, likewise whenever “IT” is selected from the slicer all records in the table that have IT in them are returned. Hope you enjoy this post!

Click through for the explanation, followed by a video that walks you through the process.

Comments closed

Working With UTC And Local Times

Jo Douglass shows how to use the DATETIMEOFFSET data type and AT TIME ZONE syntax to convert between UTC and local times:

Run select SysDateTimeOffset(); and you should see a date and time which mirrors your server’s current time, plus a time zone offset showing its current offset from UTC; this includes any time zone offset, plus any daylight savings time offset.

If I were to run this (from the UK) on August 15th, 2017 while my clock is showing that it’s noon exactly, I would get 2017-08-15 12:00:00.0000000 +01:00; the +01:00 offset is because the UK is offset by one hour from UTC during daylight savings. The datetime2 portion of a datetimeoffset is in local time, not UTC.

My normal operation is to store everything in UTC and let the application convert to local times.  That allows you to compare dates much more easily and reduces confusion around daylight savings time.

Comments closed

Forecasting Versus Predicting

Rob Collie explains that there are two different concepts which use similar names:

Once you’ve digested the illustration at the top of this article, yeah, you’ve kind already got it.

  • Forecasting is when we anticipate the behavior of “Lots” of people (customers, typically) on “Long” timelines.
  • Predictive Analytics anticipate the behavior of One person (again, typically a customer) on a “Short” timeline.

So…  Macro versus Micro.

But let’s delve just a little bit deeper, in order to “cement” the concepts.

There’s a very useful distinction here and Rob does well to flesh out the details.  I highly recommend this if you’re curious about micro- versus macro-level predictions.

Comments closed

SSMS Performance Dashboard

Pedro Lopes announces that the SQL Server Performance Dashboard is now built into SQL Server Management Studio:

Back in 2007, we released the Microsoft SQL Server 2005 Performance Dashboard Reports, which were designed to provide fast insight into performance issues from some newly created system views – DMFs (Dynamic Management Views). These were updated for SQL Server 2008 and later to SQL Server 2012, and while being very helpful they had a significant drawback – required separate download and install. This meant that when needed, most probably they were not installed in a specific SQL Server, and therefore were unusable when they were needed the most.

With the new SSMS 17.2, we are releasing the Performance Dashboard embedded as a built-in Standard Report. This means that it is available for any SQL Server instance starting with SQL Server 2008, without any extra downloads or running any extra scripts. Just connect to your server in SSMS and open the Performance Dashboard.

Aside from making it built into Management Studio, they’ve also added a few helpful things to the product, so it is worth checking out.

Comments closed

Backing Up That Linux-Based Database

David Klee shows how to back up a SQL Server on Linux database over the network:

As of SQL Server 2017 RC2, we’ll want to accomplish it in a way that is transparent to SQL Server. (Depending on the RTM version whenever it is released, I might change the recommendation on this.) To do this, we’ll want to create a folder on the local file system that actually maps to a remote network share for SQL Server backups.

SSH into your server without elevated privileges at this point.

The network share is presented from a Windows server with the SMB protocol. Linux can connect to this using a compatible protocol called CIFS, or Common Internet File System. We’ll need to install the packages so we can natively connect. On Ubuntu and other Linux distros, the easiest is with the cifs-utils package. To install from the package manager is as simple as this.

Sadly, that credentials file cannot be encrypted.

Comments closed