2018-05-11 – Curated SQL

To use Tensorflow we need to transform our data (features) in a special format. As a reminder, we have just the continuous features. So the first function used is: tf.contrib.layers.real_valued_column. The others cells allowed to us to create a train set and test set with our training dataset. The sampling is not the most relevant but it is not the goal of this article. So be careful! The sample 67-33 is not the rule!

It’s probably an indicator that I’m a casual, but I prefer to use Keras as an abstraction layer rather than working directly with TensorFlow.

Comments closed

Scientific Debt

Published 2018-05-11 by Kevin Feasel

David Robinson gives us the data science analog to technical debt:

In my new job as Chief Data Scientist at DataCamp, I’ve been thinking about the role of data science within a business, and discussing this with other professionals in the field. On a panel earlier this year, I realized that data scientists have a rough equivalent to this concept: “scientific debt.”

Scientific debt is when a team takes shortcuts in data analysis, experimental practices, and monitoring that could have long-term negative consequences. When you hear a statement like:

“We don’t have enough time to run a randomized test, let’s launch it”

“To a first approximation this effect is probably linear”

“This could be a confounding factor, but we’ll look into that later”

“It’s directionally accurate at least”

you’re hearing a little scientific debt being “borrowed”.

Read the whole thing. I strongly agree with the premise.

Comments closed

Is Query Folding Taking Place?

Published 2018-05-11 by Kevin Feasel

Chris Webb has a quick way of seeing if query folding is taking place when importing data from Analysis Services into Power Query (either Power BI or Excel):

As a quick follow-on from last week’s post on how to detect whether query folding is taking place when importing from OData data sources, if you’re importing data from Analysis Services you have a similar problem: how do you know whether query folding is taking place? Ensuring that query folding takes place for as many of the steps in your query – especially those that filter or otherwise reduce the amount of data returned – is very important for data refresh performance.

Although the Power Query engine generates MDX queries when importing from Analysis Services in the same way it generates SQL queries when it imports from a relational database, the View Native Query option doesn’t work for Analysis Services data sources. You can of course use a Profiler trace or xEvents to see the MDX, but for most users that will not be an option for security reasons.

Read on for a better alternative.

Comments closed

Resetting Slices In Power BI With Bookmarks

Published 2018-05-11 by Kevin Feasel

Reza Rad shows how to use a bookmark to set the default state of a Power BI dashboard and return to it with a single button click:

Using bookmarks for clearing all slicers in Power BI is not a a new function, I have been using it for many months, and advising many people to do it that way. However, I still get a lot of questions in my presentations about how to do that. That is why I ended up writing this post. This post shows you a very quick trick of having a button to clear all slicers, and the magic is all happening with bookmarks. Bookmarks store the state of a Power BI page, and can be used in many scenarios, in this post, I only show you the ability to clear all slicers in a page. To learn more about Power BI; read Power BI book from Rookie to Rock Star.

Click through for an example of how to set this up.

Comments closed

Constraints On Temp Tables

Published 2018-05-11 by Kevin Feasel

Kenneth Fisher argues that you should use default naming for temp table constraints:

You should be able to create a #temp in every session. That’s the idea, right? It’s one of the things that differentiates a global temp table from a local temp table. But there can be some difficulties with that.

If you are working with reusable code that uses temp tables (a stored procedure for example), sometimes you need to create a constraint. The thing about constraints is that their names are just as unique as tables, stored procedures etc. i.e. the name of a constraint can only be used once. You can’t have two tables with the same constraint name. In fact, you can’t even have a constraint name that matches a table, stored procedure etc name.

There’s some solid advice in this post.

Comments closed

Docker Containers For Multiple Versions Of SQL Server

Published 2018-05-11 by Kevin Feasel

Rob Sewell shows us how to create Docker containers for multiple versions of SQL Server running on the same machine:

I want to show the two modules running against a number of SQL Versions so I have installed

2 Domain Controllers

2 SQL 2017 instances on Windows 2016 with an Availability Group and WideWorldImporters database

1 Windows 2016 jump box with all the programmes I need

1 Windows 2016 with containers

using a VSTS build and this set of ARM templates and scripts

I wanted to create containers running SQL2017, SQL2016, SQL2014 and SQL2012 and restore versions of the AdventureWorks database onto each one.

Rob shows how to do this all via Powershell so you can automate the process.

Comments closed

When Stream Aggregates Require Sorting

Published 2018-05-11 by Kevin Feasel

Itzik Ben-Gan continues his series on grouping and aggregation:

Let’s try and figure out the costing formula for the Sort operator. Remember, our focus is the estimated cost and scaling because our ultimate goal is to figure out optimization thresholds where the optimizer changes its choices from one strategy to another.

The I/O cost estimate seems to be fixed: 0.0112613. I get the same I/O cost irrespective of factors like number of rows, number of sort columns, data type, and so on. This is probably to account for some anticipated I/O work.

As for the CPU cost, alas, Microsoft doesn’t publicly expose the exact algorithms that they use for sorting. However, among the common algorithms used for sorting by database engines in general are different implementations of merge sort and quicksort. Thanks to efforts made by Paul White, who’s fond of looking at Windows debugger stack traces (not all of us have the stomach for this), we have a bit more insight into the topic, published in his series “Internals of the Seven SQL Server Sorts.” According to Paul’s findings, the general sort class (used in the above plan) uses merge sort (first internal, then transitioning to external). In average, this algorithm requires n log n comparisons to sort n items. With this in mind, it’s probably a safe bet as a starting point to assume that the CPU part of the operator’s cost is based on a formula such as:

Operator CPU cost = <startup cost> + @numrows * LOG(@numrows) * <comparison cost>

Of course, this could be an oversimplification of the actual costing formula that Microsoft uses, but absent any documentation on the matter, this is an initial best guess.

It’s interesting to see how the calculation changes in form with larger numbers of rows.

Comments closed

Running SSMS As A Different User

Published 2018-05-11 by Kevin Feasel

Greg Low shows how to run SQL Server Management Studio as a different Windows user:

Now if all you want to do is to use a SQL Server login, then that’s easy. When you connect to a server in Object Explorer, or when you start a new Database Engine query, you can just choose SQL authentication instead.

But three other scenarios commonly occur.

If you need to run SSMS as an administrator on a machine with UAC, you can do this:

You right-click the link to SSMS and choose Run as administrator.

I will most commonly use option number three, when somebody gives me AD credentials for a separate server but my laptop is not a member of that domain.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Day: May 11, 2018

Building A Neural Network With TensorFlow

Scientific Debt

Is Query Folding Taking Place?

Resetting Slices In Power BI With Bookmarks

Constraints On Temp Tables

Docker Containers For Multiple Versions Of SQL Server

When Stream Aggregates Require Sorting

Running SSMS As A Different User