Press "Enter" to skip to content

Month: November 2016

R Visuals In Power BI

Ginger Grant discusses how to display R visuals in Power BI:

I hope that some day that this list becomes much longer, but it is a good start. If your company has lots R visuals and you wish to migrate them to Power BI, chances are some of the libraries you are using are not here. If you are interested in having your library added to the list of 352, go to the Ideas page of Power BI and request that your library be added, as Microsoft I know looks at this page to determine what to release in the future. Someone has requested that igraph be added, and since it hasn’t received a lot of votes yet (hint) it is probably low on the priority list.

Even so, this list does cover a lot of the most commonly used packages.

Comments closed

Memory-Optimized Table Types

Randolph West has a three-part series in which he looks at using memory-optimized table types and table valued parameters to increase application performance.  Part 1 introduces the concept:

In other words, for these short-lived temp tables, there’s not only an added benefit of no longer worrying about referring to tempdb..#table, but you also get a massive performance improvement as well.

Part 2 specifies the scale of performance improvements:

The test is very simple and makes use of a technique that one of my customers uses extensively: writing some rows to a temp table, so that another process can reuse those values as parameters, and perform an action, after which it destroys the temp table.

Unfortunately, the method my customer uses to populate a temp table, does not work on Azure SQL Database, because they explicitly refer to the temp tables with three-part naming (tempdb..#temptable), which is not permitted.

For the sake of the exercise, I will have a process that writes to a data structure and compare the times.

Part 3 repeats the test in Azure SQL Database:

I’m going to use the same WHILE loop again, but instead of a million runs, I’ll do 1000, 10,000 and 100,000, because I’m paying for this instance of Azure SQL Database (I picked a Premium P1, with 125 DTUs) and I’m a cheapskate. I doubt the 125 DTUs is even enough to run a million times for the fourth option.

Even in SQL Server 2014, this was a good use of In-Memory OLTP.  With the improvements in 2016, this becomes a viable option for a lot more workloads.

Comments closed

Stream Computing Platform

Ravi Peri shows how to set up the Stream Computing Platform for .NET (SCP.Net) library and kick off a job:

SCP.Net generates a zip file consisting of the topology DLLs and dependency jars.

It uses Java (if found in the PATH) or .net to generate the zip. Unfortunately, zip files generated with .net are not compatible with Linux clusters.

If you’re interesting in working with a Storm topology while writing .NET code, check this out.

Comments closed

Loading Columnstore Data

I have a post on an issue I had with loading columnstore data:

In this design, I have data coming from the transactional system, undergoing some ETL processing, and going into a staging table on the warehouse.  From there, I perform the remainder of the ETL work and insert into a rowstore table.  This rowstore table has the same attribute names and data types as the columnstore table, but instead of having a clustered columnstore index, it has a standard B-tree index and can have additional non-clustered indexes.  From there, I expose the combination table using a view which simply unions the two sets of data so the application doesn’t have to see rowstore versus columnstore tables.

If you’re doing nightly insertions, results tend to be a lot better.  But if you need close-to-but-not-quite real-time data, there are still ways to solve the problem.

Comments closed

Thinking About The Data Lake

Ust Oldfield gives architectural hints on Azure Data Lake Store:

It is very easy to treat a data lake as a dumping ground for anything and everything. Microsoft’s sale pitch says exactly this – “Storage is cheap, Store everything!!”. We tend to agree – but if the data is completely malformed, inaccurate, out of date or completely unintelligible, then it’s no use at all and will confuse anyone trying to make sense of the data. This will essentially create a data swamp, which no one will want to go into. Bad data & poorly managed files erode trust in the lake as a source of information. Dumping is bad.

This is how you get data swamps (a term which I’m so happy is catching on).  Read the whole thing.

Comments closed

Creating Plan Guides

Brent Ozar has a couple examples of creating plan guides to work around bad queries using OPTIMIZE FOR UNKNOWN:

You can see the actual execution plan here, and hover your mouse over various parts of it to see the estimated vs actual rows.

The OPTIMIZE FOR UNKNOWN hint tells SQL Server to use the density vector rather than column statistics, so it only estimates that 1,865 rows will come back – when in actuality, 3.3mm rows come back. In performance tuning, that’s what we call a “bad thing,” since SQL Server ends up doing around 10mm page reads due to that key lookup. It would have been much more efficient to just do a clustered index scan.

Creating plan guides can be ugly business, but sometimes they’re the best solution.

Comments closed

Upgrade To 2016?

Kendra Little explains why upgrading to 2016 is a good idea:

I know most of y’all care about Standard Edition.

With SP1, you can use all sorts of (formerly) super-expensive features like data compression, partitioning, Columnstore, Change Data Capture, Polybase, and more in Standard Edition.

A few features have scalability caps in “lower” editions. There’s a memory limit for In-Memory OLTP (aka Hekaton), and Columnstore. Columnstore also has restrictions on parallelism.

I’ll also throw out my experience as an anecdote:  2016 has been more stable than 2014 in our environment.  I don’t regret going to 2014, but we’re better off on 2016.

Comments closed

Power BI Admin Portal

Melissa Coates looks at the Power BI tenant settings in the admin portal:

Keep in mind that these selections apply to all users across the entire tenant. At this time we can’t control them by groups or anything of that nature.

In addition to the above settings for controlling user experience, the Admin Portal is also the place for viewing usage metrics which are helpful for determining who runs what how often (it’s not everything we could possibly want to know, but it’s a good start). The other two options, manage users and audit logs, redirect you over to the Office 365 Admin Center.

Another week, another few dozen Power BI additions…

Comments closed

Delayed Durability

Chris Taylor gives a use case for Delayed Durability:

During development and initial testing on our own hardware, we had the migration at the time running at ~25minutes for around 600 packages (ie. tables) covering (what we termed) RawSource–>Source–>Staging which was well within the performance requirements for the stage that development was at and for what was initially set out. The rest of this blog post will hone in specifically on Source–>Staging only.

However, once we transferred the solution to the clients development environment things took a turn for the worse. In our environment we were running VMs with 8 cores, 16GB RAM and utlising SSDs. The client environment was running SQL Server 2016 Enterprise on VMWare vSphere 5.5, 8 vCPUs, 32GB RAM (for Integration, Development was half this) but the infrastructure team have done everything in their power to force all VMs onto the lower tier (ie. slow disks) of their 3-PAR SAN and throttle them in every way possible, just to make things more of a challenge. Even though the VM’s themselves were throttled we were confident that we wouldn’t see too much of a performance impact, especially as this was only a subset of the processing to be done so we needed it to be quick and it will only ever get longer and longer.

Chris walks through the hallmarks of when Delayed Durability might work, and the big one for me is the way data migration works:  full reloads.  The important thing is to have a durable source and a process to repeat data loads when things get missed; in this case, it’s a full reload, but in other cases it could be watchdog applications which compare data sets on each side.

Comments closed