Press "Enter" to skip to content

Month: February 2016

Aggregates Using OVER

Slava Murygin shows aggregation and windowing using SUM:

As a conclusion: You CAN use “OVER” clause to do the aggregation in three following cases:
1. When data set is extremely small and fits in just one 8 Kb page;
2. When you want to hide your logic from any future developer or even yourself to make debugging and troubleshooting a nightmare;
3. When you really want to kill your SQL Server and its underlying disk system;

That conclusion’s rather pessimistic for my tastes, mostly because Slava’s trying to do the same thing with a window function that he’s doing with a GROUP BY clause and has multiple functions across different windows (including calculations).  Using SUM() OVER() is powerful when you still need the disaggregated values—for example, running totals.

Comments closed

HDInsight + Power BI + Spark

Reza Rad has a nice walkthrough on integrating several powerful technologies:

Power BI can connect to many data sources as you know, and Spark on Azure HDInsight is one of them. In area of working with Big Data applications you would probably hear names such as Hadoop, HDInsight, Spark, Storm, Data Lake and many other names. Spark and Hadoop are both frameworks to work with big data, they have some differences though. In this post I’ll show you how you can use Power BI (either Power BI Desktop or Power BI website) to connect to a sample of Spark that we built on an Azure HDInsight service. by completing this section you will be able to create simple spark on Azure HDInsight, and run few Python scripts from Jupyter on it to load a sample table into Spark, and finally use Power BI to connect to Spark server, load, and visualize the data.

If you’re totally unfamiliar with Spark but interested in data processing, now’s a good time to start digging into the topic.

Comments closed

Startup Stored Procedures

Kenneth Fisher describes startup stored procedures:

sp_procoption is a system stored procedure that lets us change the options on a stored procedure and in this case set it to run on startup. Note: The account that is running SQL Server needs to have permissions to start the Agent service. (Or do whatever your startup stored procedure does.) You can have as many stored procedures running on startup as you want but remember the more you have the longer it’s going to take for your instance to start.

There are a few uses cases in which startup stored procedures can be useful, but my reservation about them is similar to my reservation about triggers:  it’s not apparent to people that a startup stored procedure is in place, so if there is a problem with it, troubleshooting might be harder than normal without good documentation.

Comments closed

Widening Identity Columns

Aaron Bertrand looks at converting an identity integer into an identity bigint:

This is a very disruptive change to the structure of the table, obviously. (And an interesting side observation: the physical order of the columns, RowID and filler, have been flipped on the page.) Reserved space jumps from 136 KB to 264 KB, and average fragmentation bumps up modestly from 33.3% to 40%. This space does not get recovered by a rebuild, online or not, or a reorg, and – as we’ll see shortly – this is not because the table is too small to benefit.

Note: this is true even in the most recent builds of SQL Server 2016 – while more and more operations like this have been improved to become metadata-only operations in modern versions, this one hasn’t been fixed yet, though clearly it could be – again, especially in the case where the column is an IDENTITY column, which can’t be updated by definition.

Read the whole thing.  The clustered key scenario (which will be later in the series) is a bit more interesting to me, as that would be a more common use case for identity values.

Comments closed

System Views

Robert Sheldon has an introductory-level article on the various system views available within SQL Server:

System views are divided into categories that each serve a specific purpose. The most extensive category is the one that contains catalog views. Catalog views let you retrieve information about a wide range of system and database components—from table columns and data types to server-wide configurations.

Information schema views are similar to some of the catalog views in that they provide access to metadata that describes database objects such as tables, columns, domains, and check constraints. However, information schema views conform to the ANSI standard, whereas catalog views are specific to SQL Server.

In contrast to either of these types of views, dynamic management views return server state data that can be used to monitor and fine-tune a SQL Server instance and its databases. Like catalog views, dynamic management views are specific to SQL Server.

One of the best things the authors of SQL did was require that metadata management be in the same language:  you write SQL code to query metadata the same as if it were normal data.

Comments closed

Analysis Services Permissions

Jens Vestergaard walks through permission scope in SQL Server Analysis Services:

What this post will not be about: The how to setup basic dimension security in SSAS nor How do you manage Security.

In this post, I will highlight the difference between standard NTFS permission scope and the way SSAS handles Allowed and Denied sets when dealing with multiple roles. So if you define multiple roles on your solution, you should be on the lookout, because SSAS has some surprises.

It’s interesting that allowed permissions take precedent over denied permissions, as that’s not the norm for either NTFS or the SQL Server database engine.

Comments closed

Rebuilding Indexes By Partition

Andrea Allred shows how to rebuild indexes one partition at a time:

We have had an index job that has been failing for a while.  This is one of those things you really don’t want to clean up because no one is complaining, but you know you should.  I had heard that I could rebuild one partition at a time, but where to start?  Today, I worked my way through it, so here it is so that you can do it too.

First you need to find the biggest indexes, there is a good chance those are the ones that live on partitions. I am removing Primary Keys.

This can be a real time-saver if a majority of your partitions either are read-only or at least rarely update.

Comments closed