Press "Enter" to skip to content

Month: March 2017

Displaying Slicer Selections

Rob Collie shows an easy method to concatenate selected values for display in Excel and Power BI:

Measure Step 2: Dealing with No (aka All) Selections on Slicer

The original measure above is really awkward when the user has made NO selection on a slicer – because it can then return a REALLY long list!

To deal with that case, we add an IF to the measure to detect precisely that case, and then return “All.”

Read on for the code, as well as some caveats and additional hints.

Comments closed

It Depends: Key Lookup Edition

Raul Gonzalez shows a case in which having a key lookup beats having a covering index:

This is a very special case where the covering index has three keys and then a couple of included columns, one of which is NVARCHAR(MAX) column, so the covering index is pretty big and we only seek in the leftmost column, which is also possible using another much smaller index on that single column.

In both cases, the operator is able to push the query predicate(s) to the seek and thanks to that, the number of rows coming out the operator is not that big. But the number of rows which match the leftmost key and therefore have to be read is quite big.

That’s why it’s so important to test queries rather than assuming you know how they will behave:  sometimes the normal answer isn’t quite so.

Comments closed

Indexing Foreign Keys

Kim Tripp looks at the practice of adding non-clustered indexes on columns which make up foreign key constraints:

I’m going to tie today’s post with a question I received recently: if I have a table that has 6 foreign key columns/references should I create one index with all 6 foreign key columns in it, or should I create 6 individual indexes – one for each foreign key reference.

This is an interesting question with a few tangents to cover for our SQL101 series. I consider indexing foreign keys as part of my “Phase 1 of index tuning.” But, let’s go back to some basics before we dive into the answer on this one.

Read on for Kim’s thoughts on the topic and some good practices.

Comments closed

Introduction To Probability

Mala Mahadevan covers some basics of probability:

Probability is an important statistical and mathematical concept to understand. In simple terms – probability refers to the chances of possible outcome of an event occurring within the domain of multiple outcomes. Probability is indicated by a whole number – with 0 meaning that the outcome has no chance of occurring and 1 meaning that the outcome is certain to happen. So it is mathematically represented as P(event) = (# of outcomes in event / total # of outcomes). In addition to understanding this simple thing, we will also look at a basic example of conditional probability and independent events.

It’s a good intro to a critical topic in statistics.  If I would add one thing to this, it would be to state that probability is always conditional upon something.  It’s fair to write something as P(Event) understanding that it’s a shortcut, but in reality, it’s always P(Event | Conditions), where Conditions is the set of assumptions we made in collecting this sample.

Comments closed

R On Athena

Gopal Wunnava shows how to run R scripts using Amazon Athena as a data source:

Next, you’ll practice interactively querying Athena from R for analytics and visualization. For this purpose, you’ll use GDELT, a publicly available dataset hosted on S3.

Create a table in Athena from R using the GDELT dataset. This step can also be performed from the AWS management console as illustrated in the blog post “Amazon Athena – Interactive SQL Queries for Data in Amazon S3.”

This is an interesting use case for Athena.

Comments closed

Poor Man’s More

Chris Koester has a quick-and-easy file reader in a few lines of C#:

This post describes one way that you can read the top N rows from large text files with C#. This is very useful when working with giant files that are too big to open, but you need to view a portion of them to determine the schema, data types, etc.

I’ve used PowerShell many times to do this with large csv files, but in this example we’re going to use C# and look at the Wikipedia XML dump of pages and articles. The 3017-03-01 dump is very large and comes in at 59.5 GB.

I’ve had to write something similar before on Windows machines where I didn’t have access to more/less.  It’s really helpful for perusing the first few lines of gigantic log files.

Comments closed

What Will The DBAs Do?

Kevin Hill predicts that database administration isn’t going anywhere anytime soon:

There have been a lot of questions, posts, answers, guesses and such floating around the SQL blogs lately…most of which seem to suggest that the DBA is going away.

Hogwash.

The DBA position is not going away.  Ever.  Or at least not before I retire to Utah to spend my days mountain biking 😉

That said, Kevin does point out that you shouldn’t rest on your laurels.

One fun anecdote I have about database administration:  I recall some marketing for some NoSQL product about how, by adopting their software, you can get rid of those stodgy database administrators.  Within a couple of years, said product’s parent company was offering developer training on “advanced” techniques, which included taking backups, tuning queries, implementing disaster recovery, and creating good indexes to help with performance.  But hey, at least they don’t have DBAs!

Comments closed

Cores Visible Offline

John Morehouse enlists some assistance to figure out why his SQL Server instance is ignoring 24 cores:

Hidden schedulers are used to process requests that are internal to the engine itself.  Visible schedulers are used to handle end-user requests.  When you run the casual SELECT * query, it will utilize a visible scheduler to process the query.  With this information, if I have a 64 core server and all is well, I should have 64 visible schedulers online to process requests.

However, I discovered that some of the schedulers were set to “VISIBLE OFFLINE”.  This essentially means that those particular schedulers are unavailable to SQL Server for some reason.   How many offline schedulers do I have? A quick query resulted in 24 schedulers currently offline.  24 logical cores means that 12 physical cores are offline.

But why would a scheduler be set to “VISIBLE OFFLINE”?

Read on for the answer, and check the comments for a helpful plug for sp_Blitz.

Comments closed

RevoScaleR With Power BI

Tomaz Kastrun looks at several methods for using RevoScaleR packages in a Power BI dashboard:

I was invited to deliver a session for Belgium User Group on SQL Server and R integration. After the session – which we did online using web based Citrix  – I got an interesting question: “Is it possible to use RevoScaleR performance computational functions within Power BI?“. My first answer was,  a sceptical yes. But I said, that I haven’t used it in this manner yet and that there might be some limitations.

The idea of having the scalable environment and the parallel computational package with all the predictive analytical functions in Power BI is absolutely great. But something tells me, that it will not be that straight forward.

Read on for the rest of the story.

Comments closed