Author: Kevin Feasel

Run select SysDateTimeOffset(); and you should see a date and time which mirrors your server’s current time, plus a time zone offset showing its current offset from UTC; this includes any time zone offset, plus any daylight savings time offset.

If I were to run this (from the UK) on August 15th, 2017 while my clock is showing that it’s noon exactly, I would get 2017-08-15 12:00:00.0000000 +01:00; the +01:00 offset is because the UK is offset by one hour from UTC during daylight savings. The datetime2 portion of a datetimeoffset is in local time, not UTC.

My normal operation is to store everything in UTC and let the application convert to local times. That allows you to compare dates much more easily and reduces confusion around daylight savings time.

Comments closed

Forecasting Versus Predicting

Published 2017-08-09 by Kevin Feasel

Rob Collie explains that there are two different concepts which use similar names:

Once you’ve digested the illustration at the top of this article, yeah, you’ve kind already got it.

Forecasting is when we anticipate the behavior of “Lots” of people (customers, typically) on “Long” timelines.

Predictive Analytics anticipate the behavior of One person (again, typically a customer) on a “Short” timeline.

So… Macro versus Micro.

But let’s delve just a little bit deeper, in order to “cement” the concepts.

There’s a very useful distinction here and Rob does well to flesh out the details. I highly recommend this if you’re curious about micro- versus macro-level predictions.

Comments closed

SSMS Performance Dashboard

Published 2017-08-09 by Kevin Feasel

Pedro Lopes announces that the SQL Server Performance Dashboard is now built into SQL Server Management Studio:

Back in 2007, we released the Microsoft SQL Server 2005 Performance Dashboard Reports, which were designed to provide fast insight into performance issues from some newly created system views – DMFs (Dynamic Management Views). These were updated for SQL Server 2008 and later to SQL Server 2012, and while being very helpful they had a significant drawback – required separate download and install. This meant that when needed, most probably they were not installed in a specific SQL Server, and therefore were unusable when they were needed the most.

With the new SSMS 17.2, we are releasing the Performance Dashboard embedded as a built-in Standard Report. This means that it is available for any SQL Server instance starting with SQL Server 2008, without any extra downloads or running any extra scripts. Just connect to your server in SSMS and open the Performance Dashboard.

Aside from making it built into Management Studio, they’ve also added a few helpful things to the product, so it is worth checking out.

Comments closed

Backing Up That Linux-Based Database

Published 2017-08-09 by Kevin Feasel

David Klee shows how to back up a SQL Server on Linux database over the network:

As of SQL Server 2017 RC2, we’ll want to accomplish it in a way that is transparent to SQL Server. (Depending on the RTM version whenever it is released, I might change the recommendation on this.) To do this, we’ll want to create a folder on the local file system that actually maps to a remote network share for SQL Server backups.

SSH into your server without elevated privileges at this point.

The network share is presented from a Windows server with the SMB protocol. Linux can connect to this using a compatible protocol called CIFS, or Common Internet File System. We’ll need to install the packages so we can natively connect. On Ubuntu and other Linux distros, the easiest is with the cifs-utils package. To install from the package manager is as simple as this.

Sadly, that credentials file cannot be encrypted.

Comments closed

Deep Learning Isn’t The End-All Be-All Solution

Published 2017-08-08 by Kevin Feasel

Pablo Cordero explains that deep learning solutions are not the best choice in all cases:

The second preconception I hear the most is the hype. Many yet-to-be practitioners expect deep nets to give them a mythical performance boost just because it worked in other fields. Others are inspired by impressive work in modeling and manipulating images, music, and language – three data types close to any human heart – and rush headfirst into the field by trying to train the latest GAN architecture. The hype is real in many ways. Deep learning has become an undeniable force in machine learning and an important tool in the arsenal of any data modeler. Its popularity has brought forth essential frameworks such as tensorflow and pytorch that are incredibly useful even outside deep learning. Its underdog to superstar origin story has inspired researchers to revisit other previously obscure methods like evolutionary strategies and reinforcement learning. But it’s not a panacea by any means. Aside from lunch considerations, deep learning models can be very nuanced and require careful and sometimes very expensive hyperparameter searches, tuning, and testing (much more on this later in the post). Besides, there are many cases where using deep learning just doesn’t make sense from a practical perspective and simpler models work much better.

It’s a very interesting article, pointing out that deep learning solutions work better than expected on smaller data sizes, but there are areas where it’s preferable to choose something else.

Comments closed

Using Hive LLAP On ElasticMapReduce

Published 2017-08-08 by Kevin Feasel

Jigar Mistry shows how to configure and use Hive LLAP on AWS’s ElasticMapReduce:

With many options available in the market (Presto, Spark SQL, etc.) for doing interactive SQL over data that is stored in Amazon S3 and HDFS, there are several reasons why using Hive and LLAP might be a good choice:

For those who are heavily invested in the Hive ecosystem and have external BI tools that connect to Hive over JDBC/ODBC connections, LLAP plugs in to their existing architecture without a steep learning curve.
It’s compatible with existing Hive SQL and other Hive tools, like HiveServer2, and JDBC drivers for Hive.
It has native support for security features with authentication and authorization (SQL standards-based authorization) using HiveServer2.
LLAP daemons are aware about of the columns and records that are being processed which enables you to enforce fine-grained access control.
It can use Hive’s vectorization capabilities to speed up queries, and Hive has better support for Parquet file format when vectorization is enabled.
It can take advantage of a number of Hive optimizations like merging multiple small files for query results, automatically determining the number of reducers for joins and groupbys, etc.
It’s optional and modular so it can be turned on or off depending on the compute and resource requirements of the cluster. This lets you to run other YARN applications concurrently without reserving a cluster specifically for LLAP.

Read on for more details, including the bootstrap action you need to take and how to use LLAP once you have it configured.

Comments closed

Convert SSAS Tabular Processing Scripts Into Tables

Published 2017-08-08 by Kevin Feasel

Chris Koester shows how to take an Analysis Services Tabular processing script in TMSL format and turn it into a table using OPENJSON:

The previous post looked at how to process SSAS Tabular models with TMSL. Since SQL Server adds new JSON capabilities in 2016, let’s look at how to convert TMSL JSON to a Table with OPENJSON. OPENJSON is a new function in SQL Server 2016 that, per Microsoft:

OPENJSON is a table-valued function that parses JSON text and returns objects and properties from the JSON input as rows and columns.

In short, OPENJSON converts JSON text to a table. Since TMSL is JSON, this function can convert a SSAS Tabular processing script into a table. This could be useful if you wanted to document a SSAS processing schedule.

That’s an interesting use of OPENJSON.

Comments closed

Drawing Cubes With SQL Server Spatial

Published 2017-08-08 by Kevin Feasel

Slava Murygin has entered his cubism phase:

Hey, there is a time to go level up and instead of drawing Spirals, Fractals and other cool stuff I decided to go 3D!

So, the first my try will be drawing 3D cubes.
As you know, SQL is not an Object Orienting Programming language, and I can’t just simply create an Object “Cube” with certain properties. To create a Cube I need a Stored Procedure:

Click through for a touch of Picasso in your database.

Comments closed

Filtered Statistics

Published 2017-08-08 by Kevin Feasel

William Wolf shows us the value of filtered statistics:

Wolf only had 700 complaints, but 166,900 records were estimated for return. He is looking much worse than reality shows.

So, what is happening is that there are 3 possible employee results for complaints. It is rather simple. CE is taking the total amount of records(500,701) and dividing by 3 assuming that all 3 will have roughly the same amount of records. We see that along with the estimated number of records being the same, the execution plan operators are the same. For such a variation in amount of records, there must be a better way.

I rarely create filtered statistics, in part because I don’t have a good idea of exactly which values people will use when searching. But one slight change to Wolf’s scenario might help: having a filter where name = Sunshine and a filter where name <> Sunshine (or name is null). That might help a case where there’s extreme skew with one value and the rest are much closer to uniformly distributed.

Comments closed

Temporal Memory-Optimized Tables

Published 2017-08-08 by Kevin Feasel

Ned Otter describes how hybrid disk + memory-optimized temporal tables differ from on-disk temporal tables:

As changes are made to rows in the temporal memory-optimized table, before being transferred to the history table on disk, they are first migrated to an internal memory-optimized staging table. That means when you query the “history table”, you could be retrieving rows from both the on-disk history table, and internal staging table. Because no custom indexing was possible on the internal staging table, there could be performance implications when executing queries against historical data. Microsoft addressed these potential performance issues in SQL 2016 SP1 (detailed in this CAT blog post).

The internal staging table only gets flushed to the on-disk history table when it reaches 8% of the size of the temporal table. Given the current capacities of Windows Server 2016 (24TB memory), it’s now possible to have very large memory-optimized tables. 8% of one of those large memory-optimized tables could be quite large, which will affect query performance, if predicates don’t match available indexes.

Read on for some sobering thoughts on the topic.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31