Press "Enter" to skip to content

Day: September 20, 2016

Power BI, Excel, SharePoint, And Data Refreshing

Chris Webb shows how to ensure that data refresh works when you store multiple Excel workbooks in SharePoint or OneDrive:

I can hear you yawning already – yet another blog post on getting data from multiple Excel workbooks in Power Query and Power BI. Just about everyone who has ever written a blog post on Power BI has written about this subject, including me. However there’s a twist this time: what if your Excel workbooks are stored in SharePoint or OneDrive For Business? If they are, then your dataset may not refresh successfully after you have published unless you load your data in a particular way.

Read the whole thing if you work with Power BI.

Comments closed

Analyzing The StackLite Dataset

Marco Pasin looks at the StackLite data set:

According to Stack Overflow documentation, these are the categories of questions that may be closed by the community users:

  • duplicated
  • off topic
  • unclear
  • too broad
  • primarily opinion-based
Not everyone in the Stack Overflow community is able to close a question. In fact users need to have certain reputation expressed in points (more details here).

To calculate the overall website closure rate is easy. Just use the original “questions_2016” dataset and count how many questions have the field “Closed Date” populated. Over 10% of questions made in 2016 have been closed so far.

If you’re interested in learning more about data analysis, walk through the exercise as well and play around with the data set too.  Hat tip, R-Bloggers.

Comments closed

Azure SQL Data Warehouse Architecture

Warner Chaves looks at system views in Azure SQL Data Warehouse:

Unlike the sys.dm_exec_requests view in SQL Server, the sys.dm_pdw_exec_requests view actually keeps up to 10000 records with the information of a request even after it has executed. This capability is very useful as you can track specific query executions as long as their records are still among the 10000 kept by the view. As time passes the oldest records are phased out in favor of more recent ones.

This is an interesting look at some of the differences between Azure SQL Data Warehouse and a “normal” SQL Server installation.  Good reading.

Comments closed

Ordering In Views

Kenneth Fisher explains why you shouldn’t order in views:

For many years it’s been a best practice to never put an ORDER BY in a view. The idea is that a view shouldn’t have an inherent order. Just like any other query. If you want the data from a view ordered then you query the view with an ORDER BY clause. In fact if you put an ORDER BY in a view you’ll get an error:

Msg 1033, Level 15, State 1, Procedure MyView, Line 4 [Batch Start Line 2]
The ORDER BY clause is invalid in views, inline functions, derived tables, subqueries, and common table expressions, unless TOP, OFFSET or FOR XML is also specified.

I knew about the TOP 100 PERCENT bit, but had no idea that order was outright ignored.  Read the comments for additional information.

Comments closed

Running A Model On Separate Groups Of Data

Simon Jackson shows how to run the same model against separate groups of data in R:

Now that we can separate data for each group(s), we can fit a model to each tibble in data using map() from the purrr package (also tidyverse). We’re going to add the results to our existing tibble using mutate() from the dplyr package (again, tidyverse). Here’s a generic version of our pipe with adjustable parts in caps:

Read the whole thing.  Hat tip, R-Bloggers.

Comments closed

Uncontrolled Environments

Ed Elliott discusses database deployments in uncontrolled environments:

There have been a few discussions on stack overflow recently about how to manage deployments in uncontrolled environments, specifically data migrations. The questions were from an SSDT perspective, I don’t think that SSDT is a great choice for these uncontrolled environments and there are some additional requirements for these uncontrolled environments that need some additional thought and care when creating release scripts (whether manually or using a tool).

Ed has some interesting thoughts here, and I agree with the idea that SQL Server Data Tools deployment scripts are not the best choice when you have people changing schema all around you in unexpected ways.

Comments closed

Why Force Query Store Plans

Grant Fritchey explains the wherefore behind query store plan forcing:

But, what else does Force Plan do for you? What if you never experience bad parameter sniffing (you do, but I’m not going to argue the point)? Is there something else that Force Plan can do for you? Heck yes! The whole point of creating the Query Store was in order to address Plan Regression. What the heck is plan regression? When Microsoft makes any change to the Query Optimizer, and those changes come all the time, it’s possible that you might see a change in your execution plans. Most of the time, it’s going to be a positive change. That’s why they’re changing the Optimizer after all, to improve it. However, sometimes, you’re benefiting from the old behavior of the Optimizer and that new plan doesn’t work as well as the old plan. This is plan regression. When Microsoft changed the Cardinality Estimation engine in SQL Server 2014, more than a few people experienced the new estimator giving row estimates that resulted in a different execution plan that didn’t perform as well as the old plan. This is plan regression. What to do?

This is a good read.

Comments closed