# Day: November 29, 2016

A semi-additive average? What exactly are you trying to calculate? Let me explain first. A semi-additive measure is a measure that can be summed across some dimensions, but not all. Typically it’s the time dimension that isn’t additive. For example, the stock level at various warehouses. You can add all the stock levels of your warehouses together, to get an idea of how much stock you have for your entire company. However, you can’t add the stock level across time. 250 stock yesterday and 240 stock today doesn’t equal 490 stock for the two days. In reality the sum aggregation is replaced with another aggregation when aggregating over the non-additive dimension. In our stock example, we could use the last value known (240) or the average (245). Which aggregation you want depends on the requirements.

In this blog post I’m going to calculate a semi-additive measure, using the average for the non-additive dimension. Quite recently a colleague asked how you could calculate this in DAX. The use case is simple: there are employees that perform hours on specific tasks. The number of hours is our measure. The different tasks (the task dimension) is additive. The employee dimension however is not when we calculate an average. When two employees are selected, the result should not be the average of all the individual hours, but rather the average of the sum of the hours per employee. Let’s illustrate with an example:

That’s really interesting, and a good bit easier to do than the T-SQL equivalent (at least in one step).

I designed my sp_ctrl3 procedure from the ground up to be a development tool that I can install on any dev environment that I use regularly. I wanted a detailed overview of a specific database object as well as copy-paste-friendly T-SQL code. For instance,

• No “length” column on an int column

• Proper scale/precision values on datatypes

• “NULL” or “NOT NULL” instead of “yes” or “no”

• Identity column syntax

• “nvarchar(50)” instead of “nvarchar”, “100”

• Column defaults

• Complete index definitions

• GRANT/DENY permission statements

• The object_id of the object in sys.objects

If you use sp_help a lot, this looks like a good supplement.

The dm_exec_external_work DMV tells us which execution we care about; in this case, I ended up running the same query twice, but I decided to look at the first run of it.  Then, I can get step information from dm_exec_distributed_request_steps.  This shows that we created a table in tempdb called TEMP_ID_14 and streamed results into it.  The engine also created some statistics (though I’m not quite sure where it got the 24 rows from), and then we perform a round-robin query.  Each Polybase compute node queries its temp table and streams the data back to the head node.  Even though our current setup only has one compute node, the operation is the same as if we had a dozen Polybase compute nodes.

Click through for Wireshark-related fun.

For this section we previously installed the python module pyodbc which is needed to connect via ODBC to any SQL Server on the network giving the proper authentication method.

The following sample code can be found this link: https://www.microsoft.com/en-us/sql-server/developer-get-started/python-ubuntu

This is probably more useful in larger shops with multiple operations personnel covering different domains, but it’s nice to know that both languages play nice.

Whenever you are introduced to a new environment, either because you visit a new client or take over a new position from someone else, it’s always crucial to get on top of what’s going on. More often than not, any documentation (if you are lucky to even get hands on that) is out of date or not properly maintained. So going through that may even end up making you even more confused – or in worst case; misinformed.

In a previous engagement of mine came a request from the Data Architecture team. I was asked to produce a list of all servers and cubes running in a specific environment. They provided the list of servers and wanted to know which servers were hit by running solutions. Along with this information the team also needed all sorts of information on the connection strings from the Data Source Views, as well as which credentials were used, if possible.

If you’re dealing with a large number of cubes, this becomes even more useful.

Now there are plenty of reasons to use views in applications, however views shouldn’t be the default way of building applications because they do have this potential problems.

While working with a client the other week we had to unwind some massive nest views. Several of these views were nested 5 and 6 levels deep with multiple views being referenced by each view. When queries would run they would take minutes to execute instead of the milliseconds that they should be running in. The problems that needed to be fixed were all indexed based, but because of the massive number of views that needed to be reviewed it took almost a day to tune the single query.

Nested views is usually an indicator of somebody trying to perform OOP on a relational database, taking advantage of encapsulation.  One big performance problem with nested views is that at some point, the query optimizer gives up trying to optimize and simply pulls in all of the tables as many times as they appear.  Make the optimizer’s life easier and it will make your life easier.

This behavior also can be set to None. For example let’s say you want to have a total of sales amount regardless of gender selection, and then a total of sales amount for the selected gender in slicer. To do this copy the SalesAmount Card Visual, and then click on Gender Slicer. click on Edit Interaction, and set one of the card visuals to None, the other one as default with Filter.

Count me among the people who did not know about this.

Now that anyone can use a report which has been uploaded to the service as a starting point for a new report, there may be a decreased use of the template feature. Any report created on the desktop can be saved as a template, by selecting save on a desktop report file and changing the file type to a template. Unfortunately, templates do not contain links to the datasource used. The person creating the report must determine what data to use and if there was a dataset presently used which is refreshing the data, or create a dataset and it’s respective refresh features as part of creating a report.  Content packs provide the connections to datasets, but since the reports cannot be saved as a file for versioning, this feature is not often used instead of templates. Downloading the file and then modifying it is does resolve the issue as the starting point is then a working report with a connection to an existing dataset.

Read the whole thing.

In the upcoming version of SQL Server (for the moment known as SQL Server vNext), Microsoft has finally announced the upcoming support for the LOBs within Columnstore Indexes – thus enabling the usage of the NVARCHAR(MAX), VARCHAR(MAX), and VARBINARY(MAX) data types on the tables with Columnstore Indexes that include those columns.

For the tests, I have decided to spin a Virtual Machine in Azure with an installation of the currently available CTP1 of the SQL Server vNext, which has a version 14.0.1.126.

Read the whole thing.  It’s hard to tell at this point if these are bugs, incomplete functionality, or what, so it’ll be interesting to track changes over the CTPs.