Press "Enter" to skip to content

Curated SQL Posts

Entity Framework Slow, News At 11

Jovan Popovic shows that Entity Framework is slow and Dapper is fast:

To setup test, you can go to StackExchange/Dapper GitHub an download source code. Tests are created as C# solution (Dapper.sln). When you open this solution you can find Dapper.Tests project. You might need to change two things:

  1. Connection strings are hardcoded in Tests.cs file with values like “Server=(local)\SQL2014;Database=tempdb;User ID=sa;Password=Password12!”. You might need to change this and put your connection info.
  2. Project is compiled using dotnet sdk 1.0.0-preview2-003121, so you might get compilation errors if you don’t have a matching framework. I have removed line: “sdk”: { “version”: “1.0.0-preview2-003121” } from global.json to fix this.

Now you will be able to build project and run tests.

Nothing’s going to be faster than hand-crafted, well-tuned statements from people who know what they’re doing.  Micro-ORMs like Dapper and FSharp.Data.SqlClient will trade a little bit of a speed hit for developer niceties.  Heavier frameworks like Entity Framework and NHibernate add a lot more, but tend to be significantly slower.

Comments closed

Standard Edition Hardware

Glenn Berry tackles the question of maximizing bang for buck with hardware for SQL Server Standard Edition:

Since SQL Server 2016 Standard Edition has such a low per-instance memory limit, you should purposely choose an appropriate memory configuration that will let you use all of the license-limit memory while also getting the best memory performance possible. Only populating one DIMM per memory channel will give you the absolute best memory performance supported by your processor(s).

The major server vendors, such as Dell, offer detailed guidance on the possible memory configurations for their servers, depending on the number and specific type of processor selected. For SQL Server 2016 Standard Edition in a two-socket server with two Intel Xeon E5-2600 v4 family processors, choosing eight, 32GB DDR4 DIMMs would give you 256GB of RAM, running at the maximum supported speed of 2400MT/s.

This would allow you to set max server memory (for the buffer pool) to 131,072 MB (128GB), and still have plenty of memory left over for the operating system and for possible use by columnstore indexes and in-memory-OLTP. You would also have sixteen empty DIMM slots that could be used for future RAM expansion (which you could take advantage of if you did a subsequent Edition upgrade to Enterprise Edition). Another use for some of those empty DIMM slots would be for “tail of the log caching” on NVDIMMs (which is supported in SQL Server 2016 Standard Edition with SP1).

Click through for some very helpful advice.  If your budget is tight enough that Enterprise Edition is out of the question, it’d be terrible to pick something which wastes hardware or, even worse, wastes hardware while still forcing you to pay more for licensing.

Comments closed

Displaying Slicer Selections

Rob Collie shows an easy method to concatenate selected values for display in Excel and Power BI:

Measure Step 2: Dealing with No (aka All) Selections on Slicer

The original measure above is really awkward when the user has made NO selection on a slicer – because it can then return a REALLY long list!

To deal with that case, we add an IF to the measure to detect precisely that case, and then return “All.”

Read on for the code, as well as some caveats and additional hints.

Comments closed

It Depends: Key Lookup Edition

Raul Gonzalez shows a case in which having a key lookup beats having a covering index:

This is a very special case where the covering index has three keys and then a couple of included columns, one of which is NVARCHAR(MAX) column, so the covering index is pretty big and we only seek in the leftmost column, which is also possible using another much smaller index on that single column.

In both cases, the operator is able to push the query predicate(s) to the seek and thanks to that, the number of rows coming out the operator is not that big. But the number of rows which match the leftmost key and therefore have to be read is quite big.

That’s why it’s so important to test queries rather than assuming you know how they will behave:  sometimes the normal answer isn’t quite so.

Comments closed

Indexing Foreign Keys

Kim Tripp looks at the practice of adding non-clustered indexes on columns which make up foreign key constraints:

I’m going to tie today’s post with a question I received recently: if I have a table that has 6 foreign key columns/references should I create one index with all 6 foreign key columns in it, or should I create 6 individual indexes – one for each foreign key reference.

This is an interesting question with a few tangents to cover for our SQL101 series. I consider indexing foreign keys as part of my “Phase 1 of index tuning.” But, let’s go back to some basics before we dive into the answer on this one.

Read on for Kim’s thoughts on the topic and some good practices.

Comments closed

Introduction To Probability

Mala Mahadevan covers some basics of probability:

Probability is an important statistical and mathematical concept to understand. In simple terms – probability refers to the chances of possible outcome of an event occurring within the domain of multiple outcomes. Probability is indicated by a whole number – with 0 meaning that the outcome has no chance of occurring and 1 meaning that the outcome is certain to happen. So it is mathematically represented as P(event) = (# of outcomes in event / total # of outcomes). In addition to understanding this simple thing, we will also look at a basic example of conditional probability and independent events.

It’s a good intro to a critical topic in statistics.  If I would add one thing to this, it would be to state that probability is always conditional upon something.  It’s fair to write something as P(Event) understanding that it’s a shortcut, but in reality, it’s always P(Event | Conditions), where Conditions is the set of assumptions we made in collecting this sample.

Comments closed

R On Athena

Gopal Wunnava shows how to run R scripts using Amazon Athena as a data source:

Next, you’ll practice interactively querying Athena from R for analytics and visualization. For this purpose, you’ll use GDELT, a publicly available dataset hosted on S3.

Create a table in Athena from R using the GDELT dataset. This step can also be performed from the AWS management console as illustrated in the blog post “Amazon Athena – Interactive SQL Queries for Data in Amazon S3.”

This is an interesting use case for Athena.

Comments closed

Poor Man’s More

Chris Koester has a quick-and-easy file reader in a few lines of C#:

This post describes one way that you can read the top N rows from large text files with C#. This is very useful when working with giant files that are too big to open, but you need to view a portion of them to determine the schema, data types, etc.

I’ve used PowerShell many times to do this with large csv files, but in this example we’re going to use C# and look at the Wikipedia XML dump of pages and articles. The 3017-03-01 dump is very large and comes in at 59.5 GB.

I’ve had to write something similar before on Windows machines where I didn’t have access to more/less.  It’s really helpful for perusing the first few lines of gigantic log files.

Comments closed