Press "Enter" to skip to content

Category: Business Intelligence

Why Have A Date Dimension

Thomas LeBlanc discusses reasons for having a date dimension in a data warehouse:

The date dimension can also contain columns for Weekend versus Weekday, Holiday and month markers like 2014-10 or by quarter like 2014-Q1. All these can be computed once in the dimension table and used at will by query writers. They now do not have to know how to use T-SQL functions or concatenate substrings of “CASTed” date columns.

Then, when the DimDate is related to various Fact tables and processed into an OLAP cube, the measures and aggregations are displayable side by side through the DimDate dimension which is now considered a Conformed Dimension. The slicing and dicing of data has just been made a whole lot easier.

I’d go a step further and say that every instance should have access to a tally table and a date table.

Comments closed

Warehouses Will Live On

Jesse Seymour argues that in-memory analysis solutions will not entirely replace data warehouses:

The big reason that dimensional modeling increases clarity is that the dimensional model seeks to flatten data as much as possible.  Let’s compare two examples.  Both of these examples are for a fictional health clinic.

The first example is that we want a report on how many male patients were  treated with electric shock therapy by provider, grouped monthly and spanning year to date range.

Those big Kimball-style warehouses do a great job of making it easier for people who are not database specialists to query data and get meaningful, consistent results to known business questions.  The trick to understanding data platforms is that they tend to be complements rather than substitutes:  introducing Spark-R in your environment does not replace your Kimball-style warehouse; it complements it by letting analysts find trends more easily.  Similarly, a Hadoop cluster potentially lets you complement an existing data warehouse in a few ways:  acting as a data aggregator (which allows you to push some ETL work off onto the cluster), a data collector (especially for information which is useful but doesn’t really fit in your conformed warehouse), and a data processor (particularly for those gigantic queries which are not time-sensitive).

Comments closed

Early Metrics On Warehouse Performance

Sunil Agarwal shows some results from a sample workload indicating that SQL Server 2016 has improved two customers’ performance:

As part of SQL Server 2016 technology adoption program, during development, we work with many customers validating their production-like workload in a test environment and opportunistically take some of these workloads to production running on production-ready preview build.

In one such engagement, we worked with a customer in health industry who was running analytics workload on SYBASE IQ 15.4 system. Challenged by exponential data growth and the requirement for running analytics queries even faster for insights, the customer wanted to compare solutions from multiple vendors to see which analytical database could deliver the performance and features they need over the next 3-5 years. After extensive proof-of-concept projects, they concluded SQL Server 2016’s clustered columnstore delivered the best performance. The performance proof-of-concept tested the current database against Sybase IQ 16, MS SQL 2016, Oracle 12c, and SAP Hana using the central tables from the real-life data model filled with synthetic data in a cloud environment. MS SQL Server 2016 came out the clear winner. SAP Hana was second in performance, but also required much higher memory and displayed significant query performance outliers. Other contenders were out-performed by a factor of 2 or more.

Standard disclaimers apply:  your mileage may vary; we don’t get raw data; “all other things” are not necessarily equal.

Comments closed

Data Warehouse Design Tips

Dustin Ryan has part one of a two-part series on data warehouse design best practices:

2. Store additive measures in the data warehouse.

The best type of measures to store in the data warehouse are those measures that can be fully aggregated. A measure that can be fully aggregated is a measure that can be summarized by any dimension or all dimensions and still remain meaningful. For instance, a Sales Amount measure can be summarized by Product, Date, Geography, etc. and still provide valuable insight for the customer.

Measures that cannot be fully aggregated, such as ratios or other percentage type calculations should be handled in the semantic model or the reporting tool. For example, a measure such as Percentage Profit Margin stored in a table cannot be properly aggregated. A better option would be to store the additive measures that are the base for the Percentage Profit Margin, such as Revenue, Cost, Margin, etc. These base measures can be used to calculate the ratio in a query, semantic model, or reporting tool.

The first five tips are non-controverisal and act as a good baseline for understanding warehousing with SQL Server.  Do check it out.

Comments closed

Power BI Desktop Or Power Pivot

Bill Anton discusses when to use Power BI Desktop and when to use Power Pivot:

In the whitepaper, Strategic Prototyping is defined as the process of leveraging Power BI to explicitly seek out feedback from users during a requirements discovery session. The general idea is to use a prototyping tool to quickly slap together a model and mock up some reports while working closely with 1 or more business users. This helps ensure all reporting capabilities are flushed out and accounted for. After several iterations, the Power BI model becomes the blueprint upon which an enterprise solution can be based.

Prior to the emergence of Power BI, the tool of choice for strategic prototyping (at least in Microsoft shops) was Power Pivot. And even though the reporting side of Power Pivot is nowhere near as sexy as Power BI, there is one really awesome feature that does not (yet?) exist with Power BI… and that’s the “Import from PowerPivot” option in visual studio…

Bill does a good job of explaining the alternatives and, importantly, explaining that whichever you pick, there will be follow-up work.

Comments closed

SQL Server 2016 Editions For BI

John White discusses which version of SQL Server 2016 you should use for business intelligence work:

SQL Server 2016 removes the BI Edition as an option, leaving us with a choice between only Standard and Enterprise. The biggest news in my opinion from a licensing perspective with 2016 is that Tabular Mode will now be supported in Standard Edition. This puts the tabular model within the reach of all organizations, and closes the licensing gap in the BI stack. This is fantastic news.

There are of course limitations with Standard mode. Tabular in Standard Mode is restricted to 16 GB of RAM, which may seem like a lot, but keep in mind that tabular is an in-memory technology. It’s possible to bump into this limit fairly quickly, but it’s a limit that serves the small/medium business space rather well.

It’s surprising (in a good way) that John recommends Standard Edition, at least for small and medium businesses.

Comments closed

BI Features In 2016

Jessica Moss compares 2014 versus 2016 with respect to Business Intelligence features:

Good-bye, Business Intelligence Edition
The biggest surprise to me was the removal of the Business Intelligence edition that was initially introduced in SQL Server 2012.  Truthfully, it never seemed to fit in the environments where I worked, so I guess it makes sense.  Hopefully, fewer licensing options will make it easier for people to understand their licensing and pick the edition that works best for them.

2016 looks to be a great version for BI.

Comments closed

Power BI Doesn’t Replace Warehouses

Jesse Seymour argues that Power BI won’t replace traditional data warehouses:

Pesonally, I am still struggling to see where PowerBI fits in my organization.  I am the only BI professional here, so I have to do every bit of the process from data modeling to building front end applications.  Right now, my organization has a data warehouse with some processes in the warehouse, a Datazen environment and an SSRS environment.  There is no SSAS cubes or any power users using PowerPivot to analyze data.

Data warehouses serve a particular role in an environment:  they answer known business questions and give consistent answers across an organization.  I see Power BI as a tool with a few separate uses depending upon organizational size and maturity.  I think its best use in shops which are not large enough, well-established enough, or with enough non-IT business intelligence expertise is BI developers building beautiful dashboards for business data consumers, feeding from existing systems (including data warehouses).  In that sense, it is a complement to a Kimball-style data warehouse.

Comments closed

What’s New In Power BI 2.0?

Meagan Longoria tells us what’s in Power BI version 2.0:

The Microsoft Power BI team was fast and furious in 2015, and there are no indications they are slowing down in 2016. If you haven’t checked out Power BI V2 since it was first released last summer, you might want to take another look. Many features have been added and updated since then. Based upon the release schedules since July, it seems there are 3 separate release cycles for Power BI:

  • The Power BI Service (PowerBI.com) gets weekly updates.

  • The Power BI Desktop tool gets monthly updates.

  • The Power BI mobile apps get monthly updates.

I expect no fewer than 6 updates per week from the Power BI team.

Comments closed

Automating SSAS deployments

Matt Smith as introduced SQL Server Analysis Services deployments to Octopus Deploy:

The only thing missing was SSAS. After watching Chris Webb’s video tutorial –Cube Deployment, Processing and Admin on Project Botticelli, I decided it had to use Microsoft.AnalysisServices.Deployment.exe. After a bit of scripting and testing, I managed to write a PowerShell that updates the xml config files for the deployment – it sets the ProcessingOption to DoNotProcess’. It updates the Data source – where the cube will refresh the data from. The script isn’t perfect. For starters, what if you have more then one data source? Also what if your not using SQL Server 2014? Still the great thing about open source is that other can update it. Anyone can improve it, its not reliant on me having free time. So hopefully by the time we move to SQL 2016 someone will have already updated it to work with SQL 2016.

A big part of product maturation is automated deployment.  Good on Matt for introducing that to the community.

Comments closed