Kevin Feasel – Page 1592

Stats Histogram DMV

Published 2016-12-20 by Kevin Feasel

Erik Darling looks at a new DMV in vNext CTP 1.1:

It’s not exactly perfect

For instance, if you just let it loose without filters, you get a severe error. The same thing happens if you try to filter on one of the columns in the function, rather than a column in sys.stats, like this.

Very cool. It’s one step closer to us removing our dependencies on DBCC SHOW_STATISTICS.

Comments closed

Deploying VMs To Azure Using Powershell

Published 2016-12-20 by Kevin Feasel

Rob Sewell shows how to use Powershell to create your own Azure VM instance of the Microsoft data science virtual machine:

First, an annoyance. To be able to deploy Data Science virtual machines in Azure programmatically you first have to login to the portal and click some buttons.

In the Portal click new and then marketplace and then search for data science. Choose the Windows Data Science Machine and under the blue Create button you will see a link which says “Want to deploy programmatically? Get started” Clicking this will lead to the following blade.

Click through for a screenshot-laden explanation which leaves you with a working VM in Azure.

Comments closed

Identity Column Rollback

Published 2016-12-20 by Kevin Feasel

David Alcock figures out how identity columns behave when transactions get rolled back:

Identity columns are a very commonly used feature within tables in SQL Server. Basically when specified as an identity a column will automatically increment by the specified value; so if we have an identity increment of 1 and insert 5 rows they will automatically be numbered 1 to 5.

One cautionary measure with identities is that they don’t reset themselves when rows are deleted. If we delete rows 4 and 5 the next row will still be populated as identity 6. That’s fine, but what happens if we rollback an insert.

Read on for the answer.

Comments closed

Partition Handling In Spark 2.1

Published 2016-12-19 by Kevin Feasel

Eric Liang, et al, discuss a change to Spark 2.1 which will make certain partitioned table access faster:

In Spark 2.1, we drastically improve the initial latency of queries that touch a small fraction of table partitions. In some cases, queries that took tens of minutes on a fresh Spark cluster now execute in seconds. Our improvements cut down on table memory overheads, and make the SQL experience starting cold comparable to that on a “hot” cluster with table metadata fully cached in memory.

This looks like a nice improvement in Spark.

Comments closed

Power BI On-Prem In 2017

Published 2016-12-19 by Kevin Feasel

Paul Turley points out a blog post from the Reporting Services team:

When will we have this next Technical Preview?

We’re targeting January 2017 to release this next Technical Preview.

What’s the release vehicle for a production-ready version?

We plan to release the production-ready version in the next SQL Server release wave. We won’t be releasing it in a Service Pack, Cumulative Update, or other form of update for SSRS 2016.

When will we have a production-ready version?

We’re targeting availability in mid-2017.

That makes it sound like they’re pushing it to coincide with the vNext release.

Comments closed

Multidplyr

Published 2016-12-19 by Kevin Feasel

Matt Dancho shows how to use multidplyr to perform parallel processing on data cleansing activities:

There’s nothing more frustrating than waiting for long-running R scripts to iteratively run. I’ve recently come across a new-ish package for parallel processing that plays nicely with the tidyverse: multidplyr. The package has saved me countless hours when applied to long-running, iterative scripts. In this post, I’ll discuss the workflow to parallelize your code, and I’ll go through a real world example of collecting stock prices where it improves speed by over 5X for a process that normally takes 2 minutes or so. Once you grasp the workflow, the parallelization can be applied to almost any iterative scripts regardless of application.

This is a longer article, but if you’re using dplyr with R today, it’s worth a read.

Comments closed

Power Query And M In Tabular

Published 2016-12-19 by Kevin Feasel

Chris Webb notes that Analysis Services Tabular will get Power Query and M support:

I’ve just argued why Microsoft was obliged to include this functionality in SSAS v.next but in fact there are many positive reasons for doing this too. The most obvious one is to do with support for more data sources. At the moment SSAS Tabular supports a pretty good range of data sources, but the world of BI is getting more and more diverse and in order to stay relevant SSAS needs to support far more than it does today. By using Power Query/M as its data access mechanism, SSAS v.next will immediately support a much larger number of data sources and this number is going to keep on growing: any investment that Microsoft or third parties make for Power BI in this area will also benefit SSAS. Also, because Power Query/M can query and fold to more than just relational databases, I suspect that in the future this will allow for DirectQuery connections to many of these non-relational data sources too.

Read the whole thing.

Comments closed

Getting To ZDT

Published 2016-12-19 by Kevin Feasel

Kendra Little gives thoughts on getting to zero-downtime releases:

I’ve worked in a bunch of environments where we rolled schema changes to production throughout the week, and worked hard to limit downtime. Our biggest challenges were:

The code released wouldn’t have the intended effect, and multiple hotfixes would sometimes need to be written quickly if the issue was urgent.
Problems would happen with replication in production – either replication would be delayed because a lot of modifications were occurring, or a change in an “upstream” system would cause a huge amount of data processing to happen in a “downstream” database

I’m in a ZDT environment. There’s a pretty good amount of overhead work that goes into getting there; there are times in which a 20 minute window would have saved hours of development time, so it’s important to keep that trade-off in mind.

Comments closed

PowerQuery Contains Filtering

Published 2016-12-19 by Kevin Feasel

Reza Rad explains two different methods of filtering which may look similar but can result in quite different outputs:

The script tells the whole story. Despite the fact that you typed in “Dan” and Power Query showed you all FirstNames that has “Dan” in it. the script still use equity filters for every individual FirstName. For this data set there won’t be any issue obviously, because all FirstNames with “Dan” is already selected. However if new data rows coming in to this table in the future, and they have records with FirstNames that are not one of these values, for example Dandy, it won’t be picked! As a result the filter won’t work exactly as you expect. That’s why I say this is misleading.

Read on for notes on how to put the Contains operator to good use.

Comments closed

Missing MDS Temp Directory

Published 2016-12-19 by Kevin Feasel

Koen Verbeeck ran into an error when reinstalling Master Data Services on his Windows 10 box:

The error this time: “The ‘tempDirectory’ attribute must be set to a valid absolute path”. If you can’t see the error, it’s possible you have to enable them in the web.config file of MDS. Typically you can find this configuration file in the folder “C:\Program Files\Microsoft SQL Server\130\Master Data Services\WebApplication”. The customErrors attribute should be changed to the following:

1	<customErrors mode=“RemoteOnly” />

Read on for the solution.

Comments closed

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Author: Kevin Feasel

It’s not exactly perfect