Press "Enter" to skip to content

Author: Kevin Feasel

Analyzing The StackLite Dataset

Marco Pasin looks at the StackLite data set:

According to Stack Overflow documentation, these are the categories of questions that may be closed by the community users:

  • duplicated
  • off topic
  • unclear
  • too broad
  • primarily opinion-based
Not everyone in the Stack Overflow community is able to close a question. In fact users need to have certain reputation expressed in points (more details here).

To calculate the overall website closure rate is easy. Just use the original “questions_2016” dataset and count how many questions have the field “Closed Date” populated. Over 10% of questions made in 2016 have been closed so far.

If you’re interested in learning more about data analysis, walk through the exercise as well and play around with the data set too.  Hat tip, R-Bloggers.

Comments closed

Azure SQL Data Warehouse Architecture

Warner Chaves looks at system views in Azure SQL Data Warehouse:

Unlike the sys.dm_exec_requests view in SQL Server, the sys.dm_pdw_exec_requests view actually keeps up to 10000 records with the information of a request even after it has executed. This capability is very useful as you can track specific query executions as long as their records are still among the 10000 kept by the view. As time passes the oldest records are phased out in favor of more recent ones.

This is an interesting look at some of the differences between Azure SQL Data Warehouse and a “normal” SQL Server installation.  Good reading.

Comments closed

Ordering In Views

Kenneth Fisher explains why you shouldn’t order in views:

For many years it’s been a best practice to never put an ORDER BY in a view. The idea is that a view shouldn’t have an inherent order. Just like any other query. If you want the data from a view ordered then you query the view with an ORDER BY clause. In fact if you put an ORDER BY in a view you’ll get an error:

Msg 1033, Level 15, State 1, Procedure MyView, Line 4 [Batch Start Line 2]
The ORDER BY clause is invalid in views, inline functions, derived tables, subqueries, and common table expressions, unless TOP, OFFSET or FOR XML is also specified.

I knew about the TOP 100 PERCENT bit, but had no idea that order was outright ignored.  Read the comments for additional information.

Comments closed

Running A Model On Separate Groups Of Data

Simon Jackson shows how to run the same model against separate groups of data in R:

Now that we can separate data for each group(s), we can fit a model to each tibble in data using map() from the purrr package (also tidyverse). We’re going to add the results to our existing tibble using mutate() from the dplyr package (again, tidyverse). Here’s a generic version of our pipe with adjustable parts in caps:

Read the whole thing.  Hat tip, R-Bloggers.

Comments closed

Uncontrolled Environments

Ed Elliott discusses database deployments in uncontrolled environments:

There have been a few discussions on stack overflow recently about how to manage deployments in uncontrolled environments, specifically data migrations. The questions were from an SSDT perspective, I don’t think that SSDT is a great choice for these uncontrolled environments and there are some additional requirements for these uncontrolled environments that need some additional thought and care when creating release scripts (whether manually or using a tool).

Ed has some interesting thoughts here, and I agree with the idea that SQL Server Data Tools deployment scripts are not the best choice when you have people changing schema all around you in unexpected ways.

Comments closed

Why Force Query Store Plans

Grant Fritchey explains the wherefore behind query store plan forcing:

But, what else does Force Plan do for you? What if you never experience bad parameter sniffing (you do, but I’m not going to argue the point)? Is there something else that Force Plan can do for you? Heck yes! The whole point of creating the Query Store was in order to address Plan Regression. What the heck is plan regression? When Microsoft makes any change to the Query Optimizer, and those changes come all the time, it’s possible that you might see a change in your execution plans. Most of the time, it’s going to be a positive change. That’s why they’re changing the Optimizer after all, to improve it. However, sometimes, you’re benefiting from the old behavior of the Optimizer and that new plan doesn’t work as well as the old plan. This is plan regression. When Microsoft changed the Cardinality Estimation engine in SQL Server 2014, more than a few people experienced the new estimator giving row estimates that resulted in a different execution plan that didn’t perform as well as the old plan. This is plan regression. What to do?

This is a good read.

Comments closed

Error Handling With Extended Events, Part 2

Dave Mason continues his discussion of using Extended Events to handle errors:

In the last post, we explored a couple of examples of using Extended Events to enhance T-SQL error handling. There was some potential there. But a hard-coded SPID was necessary: we couldn’t use the code examples for anything automated. It was cumbersome, too. Let’s change that, shall we?

To make the code easier to work with, I moved most of it into three stored procs: one each to create an XEvent session, get the XEvent session data, and drop the XEvent session. There’s also a table type. This will negate the need to declare a temp table over and over. The four objects can be created in any database you choose. I opted to create them in [tempdb]. The code for each is below in the four tabs.

This is a very interesting solution.

Comments closed

Kinesis Analytics

Ryan Nienhuis shows how to implement Amazon Kinesis Analytics:

As I covered in the first post, streaming data is continuously generated; therefore, you need to specify bounds when processing data to make your result set deterministic. Some SQL statements operate on individual rows and have natural bounds, such as a continuous filter that evaluates each row based upon a defined SQL WHERE clause. However, SQL statements that process data across rows need to have set bounds, such as calculating the average of particular column. The mechanism that provides these bounds is a window.

Windows are important because they define the bounds for which you want your query to operate. The starting bound is usually the current row that Amazon Kinesis Analytics is processing, and the window defines the ending bound.

Windows are required with any query that works across rows, because the in-application stream is unbounded and windows provide a mechanism to bind the result set and make the query deterministic. Analytics supports three types of windows: tumbling, sliding, and custom.

The concepts here are very similar to Azure’s Stream Analytics.

Comments closed