Press "Enter" to skip to content

Author: Kevin Feasel

Linked Lists

Ewald Cress digs into linked lists to explain (deep) SQLOS internals:

The memory layout of a linked list doesn’t imply specific usage semantics. If we consistently insert at the head and remove from the tail, we have a queue. If we both insert and remove items from the head, we have a stack. And it is possible to have variations of these as well.

Finally, it is clear that insert and remove operations are multi-step, and the list is in an inconsistent state – i.e. not safe to traverse or modify – in the middle of such an operation. For this reason, locking semantics must be implemented. This will typically take the form of a spinlock which must be aquired before trying to access the list for any purpose. The object which owns the list head will then normally have a spinlock as a data member associated with the list head, although it is possible to have one spinlock protect multiple items beyond just a single linked list; this could be a sign of sane design, but conversely it means a coarser locking grain, which can sometimes work against you.

Even at this “simple” level, we’re digging pretty deep here.

Comments closed

Deadlocks Aren’t Blocks

Dave Mason shows the difference between a blocked process and a deadlocked process:

Sometimes our end users may not know the difference between a deadlock and blocking. In fact, there are some that may use these two interchangeably, thinking they are the same thing. A quick demonstration is often helpful to show the difference. Here’s a simple example that uses two global temp tables.

NOTE: I want to stress that this is merely a simple example, and not a deep dive or exhaustive article. Books have been written on concurrency, including one by Kalen Delaney (b|t). It’s free, and highly recommended.

Ultimately, blocked processes can complete (once the blocker finishes its work); a set of deadlocked processes can never complete, which is why (at least) one of the deadlocked processes needs to be rolled back.  I like Dave’s “show me” approach here.

Comments closed

Business Logic

Ed Elliott hits a classic architectural argument—whether business logic should be in stored procedures;

Stackoverflow is a specific use case and they decided to use .Net so they have a specific set of problems to deal with in terms of performance. They deploy (as I understand it) 10 times a day so if they need to change a query then they can quickly and easily – how quickly can you modify code and get it to production to fix a problem causing downtime on your mission critical app written in powerbuilder 20 years ago? (I jest but you get the point)

I like Ed’s back-and-forth arguing, as there are legitimate cases for both sides and the best answer almost always is somewhere in between for line of business apps.   I have three points that I tend to mention whenever this discussion comes up.

First, a lot of “business logic” is actually data logic.  Check constraints, foreign key constraints, unique key constraints, and even primary key constraints (for non-surrogate primary keys) are business rules, but they’re business rules around how the data is shaped and it’s a lot better to use your database system to maintain those rules.

Second, validation rules should be everywhere.  The fancy Javascript library should do validation, the server-side business logic should do validation, and the database should do validation.  You don’t know what’s going to skip one or more of these layers, and your database is the final gatekeeper preventing bad data from sneaking into your system.

Third, at the margin, go where your maintenance developers are most comfortable.  If they’re really good with C# but not good with SQL, the marginal business logic (the stuff you could really go either way on) should stay in the app tier; if your maintainers have really strong SQL skills but are lagging on the .NET side, I’d stick the marginal logic in stored procedures.

Comments closed

Hortonworks Revenue Growth

Alex Woodie reports that Hortonworks has seen its revenue grow 85% 1Q YOY:

Support subscription revenue during the quarter was up sharply from $13.1 million to $27.6 million, an increase of 110 percent compared to the first quarter of 2015, which was Hortonworks’ first quarter as a public company following an IPO in late 2014. Professional services revenue accounted for $13.7 million in revenue, a 49 percent increase.

Hortonworks holds about 40% of the Hadoop market share, with Cloudera holding another 40%.

Comments closed

Remote Server Installation Using Powershell

Slava Murygin gives tips on using Powershell and task scheduler to schedule remote SQL Server installations:

Finally I’ve nailed down that topic and hopefully that will be my last post dedicated to SQL Server installations on Windows Core.

In this post I will show how it is easy to install SQL Server from a remote computer without remoting to a server, without any GUI, just by using simple command line.

I admit that setting up installation as a scheduled task on the remote machine is not something that ever came to mind before.

Comments closed

Azure SQL Database Management With Powershell

Mike Fal shows a few administration steps with Azure SQL Database, including resetting an admin password:

Walking through this, we just need to create a secure string for our password and then use the Set-AzureRmSqlServer cmdlet and pass the secure string to -SqlAdministratorPassword argument. Easy as that and we don’t even need to know what the previous password was. With this in mind, I also want to call out that you can only change the password and not the admin login name. While this is not such a big deal, be aware that once you have an admin login name, you are stuck with it.

Mike promises that his next blog post won’t take a month to publish.  Here’s hoping he’s right.

Comments closed

Data Science Notebooks

Dan Osipov discusses data science notebooks:

Even though they’ve become prominent in the past few years, they have a long history. First notebooks were available in packages like Mathematica andMatlab, used primarily in academia. More recently they’ve started getting traction in Python community with iPython Notebook. Today there are many notebooks to choose from: Jupyter (successor to the iPython Notebook), R Markdown, Apache Zeppelin,Spark Notebook, Databricks Cloud, and more. There are kernels/backends to multiple languages, such as Python, Julia, Scala, SQL, and others.

Traditionally, notebooks have been used to document research and make results reproducible, simply by rerunning the notebook on source data. But why would one want to choose to use a notebook instead of a favorite IDE or command line? There are many limitations in the current browser based notebook implementations that prevent them from offering a comfortable environment to develop code, but what they do offer is an environment for exploration, collaboration, and visualization.

Back In The Day, developers and infrastructure staff used runbooks to make sure they listed and hit all of the steps in an operation.  I don’t really know of one which integrates directly with SQL Server, but Jupyter is probably the best-known cross-platform notebook.

Comments closed

Hadoop And SQL Server Are Complements

Jim Scott explains that Hadoop and relational databases solve different problems:

That’s the basics. Peeling back the onion more reveals other distinct differences, further making the case more strongly for a Hadoop-RDBMS coexistence strategy. RDBMS has the backing of the biggest names in the software industry, and as such has fostered an install base of IT talent probably second to none. RDBMS integrate very well with other systems, and represent a very mature technology having venerable, 40-year old roots. RDBMS are baked into the very fabric of just about every mid-to large sized IT organization in the world. Believe it – RDBMS aren’t going away any time soon, nor should they.

Relational databases have a strong mathematical footing which provides unparalleled data integrity.  Hadoop has a strong mathematical footing which provides near-linear scale out.  The key is knowing the problem you need to solve and how to integrate the results.

Comments closed

Implicit Conversion (Sometimes) Harms Performance

Grant Fritchey looks at implicit conversion and the havoc it can wreak:

Letting SQL Server change data types automatically can seriously impact performance in a negative way. Because a calculation has to be run on each column, you can’t get an index seek. Instead, you’re forced to use a scan. I can demonstrate this pretty simply. Here’s a script that sets up a test table with three columns and three indexes and tosses a couple of rows in:

You might get lucky and have the database engine realize that it doesn’t need to give you a horrible execution plan, but it’s sound advice to ensure that data types match on joins and filters.

Comments closed

“Permanent” Temp Tables

Brent Ozar shows two ways of creating “permanent” temp tables:

The first one disappears when my session is over, but the latter two persist until the SQL Server is restarted.

Why would you ever do the latter two? Say you need to share data between sessions, or between different applications, or staging tables for a data warehouse, or just faster tables that live on local SSDs in a cluster (as opposed to slower shared storage), or you wanna build a really crappy caching tier.

Brent also talks about stored procedures.

Comments closed