Press "Enter" to skip to content

Month: August 2017

Rebuilding Versus Reorganizing Rowstore Indexes

Paul Randal explains the difference between rebuilding and reorganizing rowstore indexes:

Rebuilding an index requires building a new index before dropping the old index, regardless of the amount of fragmentation present in the old index. This means you need to have enough free space to accommodate the new index.

Reorganizing an index first squishes the index rows together to try to deallocate some index pages, and then shuffles the remaining pages in place to make their physical (allocation) order the same as the logical (key) order. This only requires a single 8-KB page, as a temporary storage for pages being moved around. So an index reorganize is extremely space efficient, and is one of the reasons I wrote the original DBCC INDEXDEFRAG for SQL Server 2000 (the predecessor of ALTER INDEX … REORGANIZE).

If you have space constraints, and can’t make use of single-partition rebuild, reorganizing is the way to go.

Click through for the rest of the story.

Comments closed

Auto-Install Docker And SQL Tools On Linux

Andrew Pruski has a script on GitHub:

So I’ve created a repository on GitHub that pulls together the code from Docker to install the Community Edition and the code from Microsoft to install the SQL command line tools.

The steps it performs are: –

  • Installs the Docker Community Edition

  • Installs the SQL Server command line tools

  • Pulls the latest SQL Server on Linux image from the Docker Hub

Read on for more details and some limitations.

Comments closed

SQL Server 2017 RC2

Microsoft has announced Release Candidate 2 of SQL Server 2017, hot on the heels of RC1:

Microsoft is pleased to announce availability of SQL Server 2017 Release Candidate 2 (RC2), which is now available for download.

The release candidate represents an important milestone for SQL Server.  Development of the new version of SQL Server along most dimensions needed to bring the industry-leading performance and security of SQL Server to Windows, Linux, and Docker containers is complete.  We are continuing to work on performance and stress testing of SQL Server 2017 to get it ready for your most demanding Tier 1 workloads, as well as some final bug fixes.

There are no new features and the Windows release notes are empty, but there are some Linux release notes as they firm up that offering before launch.

Comments closed

Web Editor For Azure Analysis Services

James Serra shows off a preview of the Azure Analysis Services web designer for tabular models:

Microsoft has released a preview of the Azure Analysis Services web designer.  This is a browser-based experience that will allow developers to start creating and managing Azure Analysis Services (AAS) semantic models quickly and easily.  SQL Server Data Tools (SSDT) and SQL Server Management Studio (SSMS) will still be the primary tools for development, but this new designer gives you another option for creating a new model or to do things such as adding a new measure to a development or production AAS model.

A highly requested feature is that you can import a Power BI Desktop file (.pbix) into an Analysis Services database.  And once imported you can reverse engineer to Visual Studio.  Note for PBIX import only Azure SQL Database, Azure SQL Data warehouse, Oracle, and Teradata are supported at this time and Direct Query models are not yet supported for import (Microsoft will be adding new connection types for import every month).

Read on for more details.

Comments closed

Conditional Counts

Mark Broadbent shows that COUNT has a few tricks up its sleeve:

When I came to compare the results against aggregated data that I had, I noticed that the values were off and it became fairly obvious that the transactional data also contained refunds and rebates (positive values but logically reflected as negative by the Transaction_Type status) and these were not just causing inaccuracies for the SUM on Sales_Value, but were also causing the COUNT for Number_Of_Sales to be wrong. In other words, refunds and rebates must be removed from the SUM total and not aggregated in the Number_Of_Sales columns. Now at this stage, you might be thinking that we can do this by a simple WHERE clause to filter them from the aggregates, but not only is it wrong to “throw away” data, I realised that my target tables also contained aggregate columns for refunds and rebates.

I have only used the SUM(CASE) method that Mark shows.  It’s interesting that COUNT(CASE) can work, but I agree that it is probably more confusing, if only because it’s so rare.

Comments closed

Explaining Singular Value Decomposition

Tim Bock explains how Singular Value Decomposition works:

The table above is a matrix of numbers. I am going to call it Z. The singular value decomposition is computed using the svd function. The following code computes the singular value decomposition of the matrix Z, and assigns it to a new object called SVD, which contains one vector, d, and two matrices, u and v. The vector, d, contains the singular values. The first matrix, u, contains the left singular vectors, and vcontains the right singular vectors. The left singular vectors represent the rows of the input table, and the right singular vectors represent their columns.

Tim includes R scripts to follow along, and for this topic I definitely recommend following along.

Comments closed

Thoughts On Exactly-Once Processing And First-In First-Out

Kevin Sookocheff looks into Amazon’s Simple Queue Service and explains some concepts of distributed messaging systems in the process:

In an ideal scenario, the five minute window would be a complete non-issue. Unfortunately, if you are relying on SQS’s exactly-once guarantee for critical use cases you will need to account for the possibility of this error and design your application accordingly.

On the message consumer side, FIFO queues do not guarantee exactly once delivery, because in simple fact, exactly once delivery at the transport level is provably impossible. Even if you could ensure exactly-once delivery at the transport level, it probably isn’t what you want anyways — if a subscriber receives a message from the transport, there is still a chance that it can crash before processing it, in which case you definitely want the messaging system to deliver the message again.

Instead, FIFO queues offer exactly-once processing by guaranteeing that once a message has successfully been acknowledged as processed that it won’t be delivered again. To understand more completely how this works, let’s walk through the details of how you go about consuming messages from SQS.

It’s a great read, so check it out.

Comments closed

Measure-Object Differences

Klaas Vandenberghe notes a discrepancy in how Measure-Object works, based on syntax:

So, is the -InputObject parameter broken? Not really, but we need to be aware of the logic behind it.
Maybe we are used to work with -ComputerName for a lot of cmdlets and functions, and we rely blindly upon the ability of the command to handle whatever collection we provide. We know this executes the action we chose, like Get-DbaSqlService or Get-DbaOperatingSystemor whatever, separately to every computer in the collection. The collection is ‘fold out’, ‘unpacked’, ‘split’.

-InputObject doesn’t do that! Is that wrong? Not necessarily, it may be a choice of design to have possibilities to enquire the properties of the collection itself and those of the ‘members’ or ‘children’. It’s just a surprise the behaviour differs from pipeline input to parameter input.

Something to keep in mind when writing Powershell scripts.

Comments closed

Inference Attacks

Phil Factor explains that your technique for pseudonymizing data doesn’t necessarily anonymize the data:

It is possible to mine data for hidden gems of information by looking at significant patterns of data. Unfortunately, this sometimes means that published datasets can reveal sensitive data when the publisher didn’t intend it, or even when they tried to prevent it by suppressing any part of the data that could enable individuals to be identified

Using creative querying, linking tables in ways that weren’t originally envisaged, as well as using well-known and documented analytical techniques, it’s often possible to infer the values of ‘suppressed’ data from the values provided in other, non-suppressed data. One man’s data mining is another man’s data inference attack.

Read the whole thing.  One big problem with trying to anonymize data is that you don’t know how much the attacker knows.  Especially with outliers or smaller samples, you might be able to glean interesting information with a series of queries.  Even if the application only returns aggregated results for some N, you can often put together a set of queries where you slice the population different ways until you get hidden details on individual.  Phil covers these types of inference attacks.

Comments closed