As part of SQL Server 2016 technology adoption program, during development, we work with many customers validating their production-like workload in a test environment and opportunistically take some of these workloads to production running on production-ready preview build.
In one such engagement, we worked with a customer in health industry who was running analytics workload on SYBASE IQ 15.4 system. Challenged by exponential data growth and the requirement for running analytics queries even faster for insights, the customer wanted to compare solutions from multiple vendors to see which analytical database could deliver the performance and features they need over the next 3-5 years. After extensive proof-of-concept projects, they concluded SQL Server 2016’s clustered columnstore delivered the best performance. The performance proof-of-concept tested the current database against Sybase IQ 16, MS SQL 2016, Oracle 12c, and SAP Hana using the central tables from the real-life data model filled with synthetic data in a cloud environment. MS SQL Server 2016 came out the clear winner. SAP Hana was second in performance, but also required much higher memory and displayed significant query performance outliers. Other contenders were out-performed by a factor of 2 or more.
Standard disclaimers apply: your mileage may vary; we don’t get raw data; “all other things” are not necessarily equal.
2. Store additive measures in the data warehouse.
The best type of measures to store in the data warehouse are those measures that can be fully aggregated. A measure that can be fully aggregated is a measure that can be summarized by any dimension or all dimensions and still remain meaningful. For instance, a Sales Amount measure can be summarized by Product, Date, Geography, etc. and still provide valuable insight for the customer.
Measures that cannot be fully aggregated, such as ratios or other percentage type calculations should be handled in the semantic model or the reporting tool. For example, a measure such as Percentage Profit Margin stored in a table cannot be properly aggregated. A better option would be to store the additive measures that are the base for the Percentage Profit Margin, such as Revenue, Cost, Margin, etc. These base measures can be used to calculate the ratio in a query, semantic model, or reporting tool.
The first five tips are non-controverisal and act as a good baseline for understanding warehousing with SQL Server. Do check it out.
In the whitepaper, Strategic Prototyping is defined as the process of leveraging Power BI to explicitly seek out feedback from users during a requirements discovery session. The general idea is to use a prototyping tool to quickly slap together a model and mock up some reports while working closely with 1 or more business users. This helps ensure all reporting capabilities are flushed out and accounted for. After several iterations, the Power BI model becomes the blueprint upon which an enterprise solution can be based.
Prior to the emergence of Power BI, the tool of choice for strategic prototyping (at least in Microsoft shops) was Power Pivot. And even though the reporting side of Power Pivot is nowhere near as sexy as Power BI, there is one really awesome feature that does not (yet?) exist with Power BI… and that’s the “Import from PowerPivot” option in visual studio…
Bill does a good job of explaining the alternatives and, importantly, explaining that whichever you pick, there will be follow-up work.
SQL Server 2016 removes the BI Edition as an option, leaving us with a choice between only Standard and Enterprise. The biggest news in my opinion from a licensing perspective with 2016 is that Tabular Mode will now be supported in Standard Edition. This puts the tabular model within the reach of all organizations, and closes the licensing gap in the BI stack. This is fantastic news.
There are of course limitations with Standard mode. Tabular in Standard Mode is restricted to 16 GB of RAM, which may seem like a lot, but keep in mind that tabular is an in-memory technology. It’s possible to bump into this limit fairly quickly, but it’s a limit that serves the small/medium business space rather well.
It’s surprising (in a good way) that John recommends Standard Edition, at least for small and medium businesses.
Good-bye, Business Intelligence Edition
The biggest surprise to me was the removal of the Business Intelligence edition that was initially introduced in SQL Server 2012. Truthfully, it never seemed to fit in the environments where I worked, so I guess it makes sense. Hopefully, fewer licensing options will make it easier for people to understand their licensing and pick the edition that works best for them.
2016 looks to be a great version for BI.
Pesonally, I am still struggling to see where PowerBI fits in my organization. I am the only BI professional here, so I have to do every bit of the process from data modeling to building front end applications. Right now, my organization has a data warehouse with some processes in the warehouse, a Datazen environment and an SSRS environment. There is no SSAS cubes or any power users using PowerPivot to analyze data.
Data warehouses serve a particular role in an environment: they answer known business questions and give consistent answers across an organization. I see Power BI as a tool with a few separate uses depending upon organizational size and maturity. I think its best use in shops which are not large enough, well-established enough, or with enough non-IT business intelligence expertise is BI developers building beautiful dashboards for business data consumers, feeding from existing systems (including data warehouses). In that sense, it is a complement to a Kimball-style data warehouse.
The Microsoft Power BI team was fast and furious in 2015, and there are no indications they are slowing down in 2016. If you haven’t checked out Power BI V2 since it was first released last summer, you might want to take another look. Many features have been added and updated since then. Based upon the release schedules since July, it seems there are 3 separate release cycles for Power BI:
The Power BI Service (PowerBI.com) gets weekly updates.
The Power BI Desktop tool gets monthly updates.
The Power BI mobile apps get monthly updates.
I expect no fewer than 6 updates per week from the Power BI team.
The only thing missing was SSAS. After watching Chris Webb’s video tutorial –Cube Deployment, Processing and Admin on Project Botticelli, I decided it had to use Microsoft.AnalysisServices.Deployment.exe. After a bit of scripting and testing, I managed to write a PowerShell that updates the xml config files for the deployment – it sets the ProcessingOption to ‘DoNotProcess’. It updates the Data source – where the cube will refresh the data from. The script isn’t perfect. For starters, what if you have more then one data source? Also what if your not using SQL Server 2014? Still the great thing about open source is that other can update it. Anyone can improve it, its not reliant on me having free time. So hopefully by the time we move to SQL 2016 someone will have already updated it to work with SQL 2016.
A big part of product maturation is automated deployment. Good on Matt for introducing that to the community.
Hunting the web for the general definition pulls up many one liners – and yes I guess everyone who is anyone will have a way to define it, and that definition is (or should) be based on their own experiences with building, deploying or supporting BI solutions.
If you are looking for a nice short collection of some of those definitions – and a further explanation of why you need BI – then this is a great post (http://www.jamesserra.com/archive/2013/03/why-you-need-business-intelligence/)
Rolf unpacks the definition and gives us some insight into the nature of Business Intelligence.
In some solutions with a manageable scale and a reasonable tolerance for a certain amount of data loss and inconsistency, this approach may be just fine. There are very good reasons for inconsistencies between sets of data that come from different sources. When 20 million fact rows are obtained from an online ordering system and .1% don’t have matching customers records that come from the system used to manage the master customer list, it may simply be a question of timing. Or, maybe the definition of “customer” is slightly different in the two systems. Chances are that there are also legitimate data quality issues affecting a small percentage of these records.
Whatever the case may be, a data conformity or potential data quality issue in this design scenario falls on the model designer to either fix or ignore. Again, this may or may not be a concern but it is a decision point and might raise the question of ownership when questions about data quality are raised.
Paul then goes on to show how this gets fixed in a traditional model and where you need to watch out with SSAS Tabular. Good essay worth reading.