Flink: Streams Versus Batches

Kevin Jacobs has an article comparing Apache Flink to Spark Streaming:

The other type of data are data streams. Data streams can be visualized by water flowing from a tap to a sink. This process is not ending. The nice property of streams is that you can consume the stream while it is flowing. There is almost no latency involved for consuming a stream.

Apache Spark is fundamentally based on batches of data. By that, for all processing jobs at least some latency is introduced. Apache Flink on the other hand is fundamentally based on streams. Let’s take a look at some evidence for the difference in latency.

Read the whole thing.

Introduction To Powershell

Allison Tharp is starting to learn Powershell:

PowerShell is developed by Microsoft, specifically for task automation and configuration management.  A shell is a user interface that gives the user access to services of an operating system.  The shell you are likely most familiar with is the Command Prompt.  PowerShell is like the Command Prompt in many ways, but it is much more powerful.

PowerShell is based on the .Net framework and is considered a scripting language.  Windows also developed the Windows PowerShell ISE (Integrated Scripting Environment).  This is a GUI that lets us make scripts without having to type all of the commands into the command line.

I highly recommend learning Powershell if you’re in the Windows administration space.  It’s not quite as useful on the development side (where you can quickly throw together a .NET app) but it’s great for scripting out those admin tasks.

Streaming Data To Power BI

Reza Rad shows how to hook up streaming data sources to Power BI:

As I mentioned before, download the sample project. and then select the project under this path:


And open the PBIRealTimeStreaming solution in Visual Studio. This project creates a data set including a datetime value and a numeric value and will pass that through Power BI API (which will be discussed later in another post) to Power BI service.

Real-time dashboards are great for making it look like you’re doing Very Important Things.  They can also be useful in other ways too.  Read the whole thing.

Package Processing With Biml

Kevin Feasel



Ben Weissman gives us a table-driven process for SSIS package creation through Biml:

Now, that we’ve seen multiple ways on how to use Biml to create SSIS packages that will generate and populate tables, we’ll try to bridge the gap on how to use Biml for packages that will execute stored procedures and other tasks for us. We’ll use a metadata-model for that again, though it will be very minimalistic. This approach makes it very easy to maintain – we’re actually using it in exactly this way in multiple projects.

To get started, let’s create a new database “MyBiml_SimplisticMetaData” and create two tables, a view and some sample data in there:

One area in Biml where I’ve had trouble is finding the “right” mix of SQL Server tables, T-SQL operations, and Biml (e.g., building up sets of tables for a warehouse load by storing them in a metadata table or by going through RootNode).  I tend to lean heavily toward this kind of solution, where most of my metadata lives in T-SQL, but when I do that, it always feels like there’s a “purer” way of doing it.  Regardless, one reason I like Biml is that even if I am doing it wrong (for some definition of “doing it wrong”), it’s still a huge time-saver.

Checking For Credentials

Denny Cherry uses a try-catch block to figure out if you can authenticate automatically and prompts you otherwise:

Runbooks are very powerful tools which allow you to automate PowerShell commands which need to be run at different times.  One of the problems that I’ve run across when dealing with Azure Runbooks is that there is no way to use the same script on prem during testing and the same script when deploying. This is because of the way that authentication has to be handled when setting up a runbook.

The best way to handle authentication within a runbook is to store the authentication within the Azure Automation configuration as a stored credential.  The problem here is that you can’t use this credential while developing your runbook in the normal Powershell ISE.

This is a clever idea.

Partition Alignment In Storage

Wayne Sheffield has an article on I/O partition alignment:

What we need to do is to offset the beginning of the data being stored on disk to a location more conducive to how the program is operating. This offset is known as the “Partition Alignment Offset”. To be in tune with SQL Server, this value should be an increment of 64KB. However, you also need to consider the entire storage subsystem – the disks, controllers and memory. Starting with Windows Server 2008, this offset is at 1024KB – a nice increment of 64KB that also works very nicely with most RAID disks/controllers. Prior to Windows Server 2008, partition alignment offset was not explicitly performed, so this will need to be performed.

If you’ve migrated disk from server to server to server over the years, this is worth checking out.

Explaining RBAR

Kenneth Fisher explains RBAR with the help of an animated GIF:

So 23 milliseconds for the batch version and 850 milliseconds for RBAR. What a difference.

Now in this case the code for the RBAR is also a lot more complicated. But that isn’t always the case. It also isn’t always the case that RBAR is slower. But it’s almost always a lot slower than batch.

So, while the code for RBAR is often easier to write, even though it might be physically longer, it’s probably going to be slower too.

Well-written, set-based solutions aren’t always guaranteed to be faster, but that’s one of the safest bets to make with T-SQL.

Azure SQL Database Alerts

Julie Koesmarno shows how to set up Azure SQL Database alerts:

Over the last year, I have been intentionally seeking out to get feedback from the community via various SQL events, particularly those who plan to use or are currently using Azure SQL Database. A lot of questions have come up about managing Azure SQL Database better – i.e. being more proactive and more responsive in managing Azure SQL Database. One of the ways to be more proactive about your SQL Database is by setting up alerts. As an example, you can create an alert in case DTU goes above 95% – say in the last 5 minutes, so that you can either investigate why this might be or upgrade it to a higher SKU.

This article walks through how you can setup an Alert on Azure SQL DB.

I really like the fact that they offer web hooks; that way, I can integrate these alerts with Slack or other messaging systems.

Database Project Basics

James Anderson gives a basic overview of database projects within Visual Studio:

SSDT is a VS plugin that can script out a database into individual files so that you can us a VCS (I use Git) to version control them. Once those scripts are in my Git repo, I can use it as the single source of truth to generate my releases from. This is the basis of getting our databases into our CI process. ReadyRoll will be used to further improve this process and to add our migration/upgrade scripts to our repo. SSDT is required by ReadyRoll and can be found here.

Before we can start with ReadyRoll, we need to learn some Visual Studio basics.

I’ve used database projects for the better part of a decade.  They aren’t perfect but in most environments, they’re quite helpful…if other people use them as well…


August 2016
« Jul Sep »