Recursive CTEs

Kevin Feasel

2016-05-27

Syntax

Steve Jones gives a simple recursive CTE example and an important lesson on compounding:

Recursion is an interesting computer science technique that stumps lots of people. When I was learning programming, it seemed that recursion (in Pascal) and pointers (in C), were the weed out topics.

However, they aren’t that bad, and with CTEs, we can write recursion in T-SQL. I won’t cover where this might be used in this post, though I will give you a simple CTE to view.

There are two parts you need: the anchor and the recursive member. These are connected with a UNION ALL. There can be multiple items, but we’ll keep things simple.

You can play query golf and find a way to remove the recursion, but it’s good to know how to create a recursive CTE.  It’s also good to know that you typically do not want recursion in a database process…

SystemThread

Ewald Cress looks at SystemThread:

SystemThread, a class within sqldk.dll, can be considered to be at the root of SQLOS’s scheduling capabilities. While it doesn’t expose much obviously exciting functionality, it encapsulates a lot of the state that is necessary to give a thread a sense of self in SQLOS, serving as the beacon for any code to find its way to an associated SQLOS scheduler etc. I won’t go into much of the SQLOS object hierarchy here, but suffice it to say that everything becomes derivable by knowing one’s SystemThread. As such, this class jumps the gap between a Windows thread and the object-oriented SQLOS.

The simplest way to conceptualise a thread in SQL Server is to think of a spid or connection busy executing a simple query, old skool sysprocesses style. It’s not hip, but it’s close enough to the truth to be useful. This conflates a few things that are separate entities, but it is a good starting point for teasing things apart as needed

This is another dive into internals; prepare your thinking caps.

Spark SQL For ETL

Kevin Feasel

2016-05-27

Spark

Ben Snively discusses using Spark SQL as part of an ETL process:

Now interact with SparkSQL through a Zeppelin UI, but re-use the table definitions you created in the Hive metadata store.   You’ll create another table in SparkSQL later in this post to show how that would have been done there.

Connect to the Zeppelin UI and create a new notebook under the Notebook tab. Query to show the tables. You can see that the two tables you created in Hive are also available in SparkSQL.

There are a bunch of tools in here, but for me, the moral of the story is that SQL is a great language for data processing.  Spark SQL has gaps, but has filled many of those gaps over the past year or so, and I recommend giving it a shot.

Heron

Kevin Feasel

2016-05-27

Hadoop

Twitter is open sourcing Heron, a competitor to Apache Storm:

Apache Storm was the original solution to Twitter’s problems. It was created by a marketing intelligence company called BackType, and Twitter bought the company in 2011 and eventually open-sourced Storm, providing it to the Apache Foundation.

There’s no question Storm has a lot of advantages. It’s scalable and fault-tolerant, with a decent ecosystem of “spouts,” or systems for receiving data from established sources. But it was reputedly also hard to work with and hard to get good results from, and despite a recent 1.0 renovation, it’s been challenged by other projects, including Apache Spark and its own revised streaming framework.

It’s good to see this competition in the streaming space.

Box And Whisker Plots

Slava Murygin shows how to create a box and whisker plot in SSMS using spatial data types:

If you have no idea what Box-and-Whisker Plot is, please visit following link: http://www.wellbeingatschool.org.nz/information-sheet/understanding-and-interpreting-box-plots

At first, I will show how to do it based on AdventureWorks database in SQL Server 2014.

We will analyze amounts of Individual lines of Sales Orders within each month.

The first step is to create a Data Set to process.  That Data Set will contain a Month, Single Line amount and order number of that record within a month.

This is really cool…but I wonder if it wouldn’t be better to do this in R, where it’d take a lot less code.  If you can’t reach out to R, though, this is a good way of visualizing results.

Finding Unused Indexes

SQLWayne has a script to help find unused indexes:

Here’s some code that can show you what indexes are unused or empty.  An empty index just means that there’s no data in that table right now, it may always be populated later, so I would not drop an empty index.  Besides, how much space would an empty index take?

For my personal preferences, I order the output by table then index name, also I put a u.* at the end of the select statement so the more interesting usage stat columns can be seen.

If an index truly is unused, it’s a waste of resources.  The problem is, sometimes you’ll think an index is unused but it’s really a vital part of month-end reporting or used for the CEO’s favorite dashboard.

On The Edge

Kendra Little answers a question on version novelty:

Dear SQL DBA,

What are your thoughts on the early adoption of new SQL Server versions? Specifically, if the business is willing to assume the risk of early adoption just to get one new feature that they can probably live without, should DBAs be happy and willing to assume that risk too? Or, is it our responsibility to “just say no” until it has been tested? I would like to hear about any experience you have with this. Thanks.

Bleeding in Edgeville

Go read Kendra’s answer because it’s a good one.  My answer is, I want to be on the edge.  I’ve run into V1 bugs and had to spike projects before they made it to production, but if there’s a good benefit to moving, and if your business side is supportive, I’d lean heavily toward fast upgrades.

NOLOCK Bug

Brent Ozar reports on a NOLOCK bug in SQL Server 2014 SP1 CU6:

  • While one transaction is holding an exclusive lock on an object (Ex. ongoing table update), another transaction is executing parallelized SELECT (…) FROM SourceTable, using the NOLOCK hint, under the default SQL Server lock-based isolation level or higher. In this scenario, the SELECT query trying to access SourceTable will be blocked.

  • Executing a parallelized SELECT (…) INTO Table FROM SourceTable statement, specifically using the NOLOCK hint, under the default SQL Server lock-based isolation level or higher. In this scenario, other queries trying to access SourceTable will be blocked.

The current recommendation is not to install CU6 until the issue is fixed.

Fallacies of Data Science

Adnan Masood and David Lazar have a list of fallacies in the world of data science:

Extrapolating beyond the range of training data, especially in the case of time series data, is fine providing the data-set is large enough.

Strong Evidence is same as a Proof! Prediction intervals and confidence intervals are the same thing, just like statistical significance and practical significance.

These are some good things to think about if you’re getting into analytics.

Daily Briefings

Brent Ozar shows a way to see high-priority sp_Blitz items across an environment:

Step 3: run sp_Blitz @IgnorePrioritiesAbove = 50, @CheckUserDatabaseObjects = 0. This gets you the fast headline news, especially when used with the improved priorities in the latest version in our SQL Server download kit

It’s important to have one or two high-level systems you can glance at to see top-level health of a system.  This could be one of those systems.

Categories

May 2016
MTWTFSS
« Apr Jun »
 1
2345678
9101112131415
16171819202122
23242526272829
3031