Press "Enter" to skip to content

Month: December 2016

Thinking About Memory Latency

Joe Chang throws down the gauntlet:

Naturally, the database and application should be architected together with the SQL Server NUMA tuning options to support good scaling on multi-socket NUMA systems. If we neglected this in the original design, I am sure many DBA-developers know how well such a suggestion would be received by management.

Is there another option? Well yes. Get rid of the NUMA latency issue with a non-NUMA system. Such a system has a single processor socket, hence one memory node. Before anyone scoffs, the one socket is not just a Xeon E3 with four cores.

Still, a single quad-core processor today is 40 times more powerful than a 4-way system from twenty years ago (100,000 tpm-C per core is probably possible if TPC-C were still in use, versus 10,000 on a 4-way in 1996). The Xeon E3 could probably support many medium sized organizations. Maximum memory capacity is 64GB (4x16GB unbuffered ECC DIMMs, $130 each). My recollection is that many IO problems went away at the 32-64GB level. And we could still have powerful IO with 2 PCI-E x8 SSDs, or even 2 x4’s.

This is some very interesting research.  Joe has a some gaps in his research (meaning that there’s scope for people to expand upon this), but this is 100% worth the read.

Comments closed

Missing MDS Temp Directory

Koen Verbeeck ran into an error when reinstalling Master Data Services on his Windows 10 box:

The error this time: “The ‘tempDirectory’ attribute must be set to a valid absolute path”. If you can’t see the error, it’s possible you have to enable them in the web.config file of MDS. Typically you can find this configuration file in the folder “C:\Program Files\Microsoft SQL Server\130\Master Data Services\WebApplication”. The customErrors attribute should be changed to the following:

Read on for the solution.

Comments closed

Parameterizing Always Encrypted Statements

Jakub Szymaszek shows off Parameterizing for Always Encrypted in SSMS 17.0:

First thing to note is that SSMS has rewritten the query as a parameterized statement. The literal, used to initialize the @SSN variable in the original query, is being passed inside a parameter, with an auto-generated name (@pdf9f37d6e63c46879555e4ba44741aa6). This allows the .NET Framework Data Provider for SQL Server to automatically detect that the parameter needs to be encrypted. The driver achieves that by calling sp_describe_parameter_encryption that prompts SQL Server to analyze the query statement and determine which parameters should be encrypted and how. Then, the driver, transparently encrypts the parameter value, before submitting the query to SQL Server for execution via sp_executesql. SQL Server can now successfully execute the query.

Read the whole thing.  Setting this up does obviate part of a benefit to using Always Encrypted:  the ability completely to lock out a database administrator from certain pieces of data.

Comments closed

SQL Server vNext CTP 1.1

Denis Gobo notes that there’s a new CTP for SQL Server:

TRANSLATE
This acts like a bunch of replace statements, instead of REPLACE(REPLACE(REPLACE(REPLACE(SomeVal,'[‘,'(‘),’]’,,’)’),'{‘,'(‘),’}’,,’)’) you can do the following which is much cleaner
SELECT TRANSLATE('2*[3+4]/{7-2}', '[]{}', '()()');

Running that will return 2*(3+4)/(7-2)

The translate function looks very interesting.  Click through for a few more goodies and get ready for the never-ending release cycle.

Comments closed

Hadoop And Active Directory

RK Kuppala explains how to integrate a Hadoop cluster with Active Directory:

This post explains kerberizing an existing Hadoop cluster using Ambari. Kerberos helps with the Authentication part of enterprise security (while authorization, auditing and data protection being the remaining parts).

HDP uses Kerberos, which is an industry standard for authenticate users and resources and providing strong identity for users. Apache Ambari can kerberize an existing cluster by using an existing MIT key distribution center (KDC) or Microsoft’s Active Directory.

This was a lot easier than I expected.

Comments closed

Understanding Naive Bayes

Ahmet Taspinar explains the Naive Bayes classificiation algorithm and writes Python code to implement it:

Within Machine Learning many tasks are – or can be reformulated as – classification tasks.

In classification tasks we are trying to produce a model which can give the correlation between the input data $X$ and the class $C$ each input belongs to. This model is formed with the feature-values of the input-data. For example, the dataset contains datapoints belonging to the classes ApplesPears and Oranges and based on the features of the datapoints (weight, color, size etc) we are trying to predict the class.

Ahmet has his entire post saved as a Jupyter notebook.

Comments closed

Taxi Rides And Amazon Athena

Mark Litwintschik looks at using Amazon Athena to process the New York City taxi rides data set:

It’s important to note that Athena is not a general purpose database. Under the hood is Presto, a query execution engine that runs on top of the Hadoop stack. Athena’s purpose is to ask questions rather than insert records quickly or update random records with low latency.

That being said, Presto’s performance, given it can work on some of the world’s largest datasets, is impressive. Presto is used daily by analysts at Facebook on their multi-petabyte data warehouse so the fact that such a powerful tool is available via a simple web interface with no servers to manage is pretty amazing to say the least.

Athena is Amazon’s response to Azure Data Lake Analytics.  Check out Mark’s blog post for a good way of getting started with Athena.

Comments closed

Locking Azure Resources

Arun Sirpal explains how to lock resources in Azure:

There are 2 types of lock resources in Azure.

  • Delete – Obviously you can’t delete but you can read / modify a resource, this applies to authorised users.
  • ReadOnly – Authorised users can read a resource but they cannot edit or delete it.

For this blog post I create a delete lock on one of my SQL Databases.

My overly simplistic advice:  lock any production resource which you wouldn’t want accidentally deleted.  It won’t prevent a malicious user from doing something catastrophic, but it can prevent the “Oops, I meant to click the thing above this” class of mistake.

Comments closed

R Links

Ginger Grant has some links on learning R in the context of Power BI:

Comprehensive Resource Archive Network [CRAN] is where one can download Open Source R, packages and contains lots of information about R.

Microsoft R Open which is a fully CRAN compatible version created using the Intel MKL for improved performance can be downloaded here.

One thing I would push a little bit on that list is R Tools for Visual Studio.  My default R IDE is still R Studio, but RTVS has made some nice improvements, and it’s worth checking out.

Comments closed

Understanding HTAP

James Serra explains what Hybrid Transactional and Analytical Processing means:

HTAP is used to describe the capability of a single database that can perform both online transaction processing (OLTP) and online analytical processing (OLAP) for the purpose of real-time operational intelligence processing.  The term was created by Gartner in 2014.

In the SQL Server world you can think of it as: In-memory analytics (columnstore) + in-memory OLTP = real-time operational analytics.  Microsoft supports this in SQL Server 2016 (see SQL Server 2016 real-time operational analytics).

I’m not completely sold on HTAP yet, particularly once you get to high-scale OLTP systems doing hundreds of thousands of transactions per second.  That said, there’s always more and more pressure to get data available for analytics faster and faster.

Comments closed