Migrating To In-Memory OLTP

Erin Stellato is kicking the hornet’s nest again; this time it’s about In-Memory OLTP:

In the past few months I’ve had several clients reach out about migrating to In-Memory OLTP solutions.  When the second or third request rolled in I remember thinking, “Finally!”  As in, I’ve been wondering why businesses haven’t been looking to implement In-Memory sooner.  In fact, I added a section on In-Memory to our IEPTO2 course because with the availability of the feature in SQL Server 2016 SP1 (yes, there are memory limitations, but it’s a good place to start) I figured we would see an uptick in interest.  But here we are, half way through 2017 and over 6 months since the release of SQL Server 2016 SP1, and I still see a lot of hesitation around it.

I wrote a post over on SQLPerformance last week, Testing DML Statements for In-Memory OLTP, and that generated some discussions on Twitter.  So I figured it was time for a post to find out what’s holding companies back.  This isn’t a true poll – it’s a fill-in-the-blank.  As in: post a comment.  If you have considered migrating to In-Memory and then didn’t, I want to understand why.  I recognize there are limitations – the product is still new and it’s evolving.  But perhaps if we understand the largest inhibitors to migration we can help move them up on Microsoft’s list via Connect and other pathways.  Understand I am asking for specifics here, for example: we can’t use In-Memory tables because they don’t support spatial data types.  Whatever the reason, share it via a comment.

If I were to take a wild guess, the most common answers will be something like:

  1. Not using SQL Server 2014 EE or later, or any edition of 2016 SP1 or later
  2. Limitations in what memory-optimized tables provide:  can’t go cross-database, can’t create useful constraints, etc.
  3. Syntax troubles, particularly in 2014:  no outer joins, etc.
  4. Difficulties fitting this into a legacy system:  it’s not just as simple as drop-and-replace tables due to limitations above.  Also, due to size limits (none of that NVARCHAR(MAX) business), candidate tables might need to be broken up or restructured so that they fit the mold.

I like using memory-optimized tables where I can, but have had much more success with memory-optimized table-valued parameters.

DevOps And The DBA

Kevin Feasel



Kellyn Pot’Vin-Gorman explains that DevOps does not signal the end of the DBA:

As I travel to multiple events focused on numerous platforms the database is crucial to, I’m faced with peers frustrated with DevOps and considerable conversation dedicated to how it’s the end of the database administrator.  It may be my imagination, but I’ve been hearing this same story, with the blame assigned elsewhere-  either its Agile, DevOps, the Cloud or even a latest release of the actual database platform.  The story’s the same-  the end of the Database Administrator.

The most alarming and obvious pain point of this, is that in each of these scenarios, the result was the Database Administrator a focal point in the end more so than they were when it began.  When it comes to DevOps, the specific challenges of the goal needed the DBA more so than any of these storylines.  As development hurdled top speed to deliver what the business required, the DBA and operations as a whole, delivered the security, the stability and the methodologies to build automation at the level that the other groups simply never needed previously.

This is a useful rejoinder to fears of imminent job loss.

How Kafka Does Exactly-Once

Kevin Feasel



Neha Narkhede explains exactly-once messaging and describes how Kafka implements their exactly-once process:

In a distributed system, the computers that make up the system can always fail independently of one another. In the case of Kafka, an individual broker can crash, or a network failure can happen while the producer is sending a message to a topic. Depending on the action the producer takes to handle such a failure, you can get different semantics:

  • At least once semantics: if the producer receives an acknowledgement (ack) from the Kafka broker and acks=all, it means that the message has been written exactly once to the Kafka topic. However, if a producer ack times out or receives an error, it might retry sending the message assuming that the message was not written to the Kafka topic. If the broker had failed right before it sent the ack but after the message was successfully written to the Kafka topic, this retry leads to the message being written twice and hence delivered more than once to the end consumer. And everybody loves a cheerful giver, but this approach can lead to duplicated work and incorrect results.

  • At most once semantics: if the producer does not retry when an ack times out or returns an error, then the message might end up not being written to the Kafka topic, and hence not delivered to the consumer. In most cases it will be, but in order to avoid the possibility of duplication, we accept that sometimes messages will not get through.

  • Exactly once semantics: even if a producer retries sending a message, it leads to the message being delivered exactly once to the end consumer. Exactly-once semantics is the most desirable guarantee, but also a poorly understood one. This is because it requires a cooperation between the messaging system itself and the application producing and consuming the messages. For instance, if after consuming a message successfully you rewind your Kafka consumer to a previous offset, you will receive all the messages from that offset to the latest one, all over again. This shows why the messaging system and the client application must cooperate to make exactly-once semantics happen.

Read on for a discussion of technical details.  I appreciate how Neha linked to a 60+ page design document as well, for those wanting to dig into the details.

Connecting Hadoop To LDAP

Kevin Feasel



Giovanni Lanzani walks through some of the difficulties of getting LDAP working with Hadoop:

This section could probably could have much less workarounds if I’d knew more about LDAP.

But I’m a data scientist at heart and I want to get things done.

If you ever dealt with Hadoop, you know that there are a bunch of non-interactive users, i.e. users who are not supposed to login, such as hdfs, spark, hadoop, etc. These users are important to have. However the groups with the same name are also important to have. For example when using airflow and launching a spark job, the log folders will be created under the airflow user, in the spark group.

LDAP, however, doesn’t allow you, to my knowledge, to have overlapping user/groups, as Unix does.

The way I solved it was to create, in LDAP, the spark_user (or hdfs_user or …) to work around this limitation.

Also, Giovanni apparently lives in an interesting neighborhood.

Neural Nets With R And Power BI

Leila Etaati continues her series on using neural nets in Power BI:

we are going to predict the concrete strength using neural network. neural network can be used for predict a value or class, or it can be used for predicting multiple items. In this example, we are going to predict a value, that is concrete strength.

I have loaded the data in power bi first, and in “Query Editor” I am going to write some R codes. First we need to do some data transformations. As you can see in the below picture number 2,3 and 4,data is not in a same scale, we need to do some data normalization before applying any machine learning. I am going to write a code for that (Already explained the normalization in post KNN). So to write some R codes, I just click on the R transformation component (number 5).

There’s a lot going on in this demo; check it out.

Loan Chargeoff Templates

Ajay Jagannathan announces a couple new Cortana Intelligence Solutions Gallery templates:

For more information, read this blog: End to End Loan ChargeOff Prediction Built Using Azure HDInsight Spark Clusters and SQL Server 2016 R Service

We have published two solution templates deployable using two technology stacks for the above chargeoff scenario:-

  1. Loan Chargeoff Prediction using SQL Server 2016 R Services – Using DSVM with SQL Server 2016 and Microsoft ML, this solution template walks through how to create and clean up a set of simulated data, use 5 different models to train, select the best performant model, perform scoring using the model and save the prediction results back to SQL Server. A PowerBI report connects to the prediction table and show interactive reports with the user on the chargeoff prediction.

  2. Loan Chargeoff Prediction using HDInsight Spark Clusters – This solution demonstrates how to develop machine learning models for predicting loan chargeoff (including data processing, feature engineering, training and evaluating models), deploy the models as a web service (on the edge node) and consume the web service remotely with Microsoft R Server on Azure HDInsight Spark clusters. The final predictions is saved to a Hive table which could be visualized in Power BI.

These tend to be nice because they show you how the different pieces of the Azure stack tie together.

Wait Stats In Query Store

Andrejs Antjufejevs has great news if you’re using Query Store:

Starting today in Azure SQL Database and from CTP 2.0 of SQL Server 2017 wait stats per query are available in Query Store. Now you can exactly identify why and how much every plan waited for some resource. Information about wait times are persisted so you can also analyze through the history what was the problems and why queries waited for resources

This is a welcome improvement for query tuners on 2017.

Automatic Temporal Table Data Purging

Bert Wagner has warmed the cockles of my heart with this news:

The problem with temporal tables is that they produce a lot of data. Every row-level change stored in the temporal table’s history table quickly adds up, increasing the possibility that a low-disk space warning is going to be sent to the DBA on-call.

In the future with SQL Server 2017 CTP3, Microsoft allows us to add a retention period to our temporal tables, making purging old data in a temporal table as easy as specifying:

I’m in a situation where this will be very useful.

Test The DBATools Beta

Chrissy LeMaire wants you to test the beta of dbatools:

Before the official release of bagofbobbish to master and the PowerShell Gallery, we need help finding bugs. Then, we’ll need some time to resolve those bugs. Hopefully this can be done before community members show off dbatools at a few key SQLSaturdays around the world this Saturday, July 8th.

We would really appreciate it if you would download the beta from GitHub and (in a test environment) see if you can find anything that doesn’t work as expected.

If you find any bugs, please file a report on GitHub. You can also reach out to us in the Slack channel.

Currently, there aren’t any webpages for the commands listed in this post, but all commands have help, so when you need help, simply type Get-Help commandName -Examples or Get-Help commandName -Full.

Get testing.  There are a lot of new commands, so if you haven’t checked out dbatools in a while, give it a go.  Also, congrats to Rob Sewell for his newly minted MVP status.

Option Explicit In Biml VB

Kevin Feasel



Ben Weissman now has nothing to stop him from writing bad VB code in Biml:

Previously, you had to declare any kind of variable and object type (instead of just using something like “var” in C#):

<#@template Language="VB" #>
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<# for n as integer = 1 to 25 #>
    <Package Name="MyAutomatedBiml<#= n #>"/>
<# next#>

That was true even for the most simple cases like for n as integer = 1 to 25 instead of just for n = 1 to 25, even though it is clear that his can never be anything else in this context.

Now, you can use two new attributes in the template definition: optionexplicit and/or optionstrict

Read the whole thing if you want to write VB code in Biml.  If you want to write C# code in Biml, keep doing your thing.  If you want to write F# in Biml, the pitchfork mob is organizing over here.


July 2017
« Jun