Press "Enter" to skip to content

Author: Kevin Feasel

Hadoop In The Trough Of Disillusionment

Alex Woodie has an article about companies moving away from Hadoop:

Instead of trying to fit all the barnyard animals into the name (Cutting suggested Hadoop + Hive + Hbase + Spark + all the others omnivores, as well as “Cutting Con,” which maybe actually would have worked), the conference organizers went back to the roots of the Strata conference in 2011.

(Note to self: it’s ALL about the data.)

That doesn’t mean Hadoop is irrelevant. We will need a place to land unstructured and semi-structured data. But when the biggest Hadoop distributor removes the name of Hadoop from its flagship conference, it’s clearly an indicator that things haven’t gone quite as expected.

I’ve seen several articles along these lines lately and couldn’t resist the Gartner callout.  I consider this a helpful antidote to the “Technology X will solve all your problems!” marketing nonsense, which followed the “Technology X will solve all my problems!” developer nonsense as developers find new and shiny toys.  People are realizing where Hadoop is a great solution and where it’s a bad solution, and the same goes for other technologies; my hope is that after another 9-12 months of “Is Hadoop doomed?” types of articles, it’ll settle out into a long-term growth pattern where people understand its appropriate uses.

Comments closed

Entity Framework Slow, News At 11

Jovan Popovic shows that Entity Framework is slow and Dapper is fast:

To setup test, you can go to StackExchange/Dapper GitHub an download source code. Tests are created as C# solution (Dapper.sln). When you open this solution you can find Dapper.Tests project. You might need to change two things:

  1. Connection strings are hardcoded in Tests.cs file with values like “Server=(local)\SQL2014;Database=tempdb;User ID=sa;Password=Password12!”. You might need to change this and put your connection info.
  2. Project is compiled using dotnet sdk 1.0.0-preview2-003121, so you might get compilation errors if you don’t have a matching framework. I have removed line: “sdk”: { “version”: “1.0.0-preview2-003121” } from global.json to fix this.

Now you will be able to build project and run tests.

Nothing’s going to be faster than hand-crafted, well-tuned statements from people who know what they’re doing.  Micro-ORMs like Dapper and FSharp.Data.SqlClient will trade a little bit of a speed hit for developer niceties.  Heavier frameworks like Entity Framework and NHibernate add a lot more, but tend to be significantly slower.

Comments closed

Standard Edition Hardware

Glenn Berry tackles the question of maximizing bang for buck with hardware for SQL Server Standard Edition:

Since SQL Server 2016 Standard Edition has such a low per-instance memory limit, you should purposely choose an appropriate memory configuration that will let you use all of the license-limit memory while also getting the best memory performance possible. Only populating one DIMM per memory channel will give you the absolute best memory performance supported by your processor(s).

The major server vendors, such as Dell, offer detailed guidance on the possible memory configurations for their servers, depending on the number and specific type of processor selected. For SQL Server 2016 Standard Edition in a two-socket server with two Intel Xeon E5-2600 v4 family processors, choosing eight, 32GB DDR4 DIMMs would give you 256GB of RAM, running at the maximum supported speed of 2400MT/s.

This would allow you to set max server memory (for the buffer pool) to 131,072 MB (128GB), and still have plenty of memory left over for the operating system and for possible use by columnstore indexes and in-memory-OLTP. You would also have sixteen empty DIMM slots that could be used for future RAM expansion (which you could take advantage of if you did a subsequent Edition upgrade to Enterprise Edition). Another use for some of those empty DIMM slots would be for “tail of the log caching” on NVDIMMs (which is supported in SQL Server 2016 Standard Edition with SP1).

Click through for some very helpful advice.  If your budget is tight enough that Enterprise Edition is out of the question, it’d be terrible to pick something which wastes hardware or, even worse, wastes hardware while still forcing you to pay more for licensing.

Comments closed

Displaying Slicer Selections

Rob Collie shows an easy method to concatenate selected values for display in Excel and Power BI:

Measure Step 2: Dealing with No (aka All) Selections on Slicer

The original measure above is really awkward when the user has made NO selection on a slicer – because it can then return a REALLY long list!

To deal with that case, we add an IF to the measure to detect precisely that case, and then return “All.”

Read on for the code, as well as some caveats and additional hints.

Comments closed

It Depends: Key Lookup Edition

Raul Gonzalez shows a case in which having a key lookup beats having a covering index:

This is a very special case where the covering index has three keys and then a couple of included columns, one of which is NVARCHAR(MAX) column, so the covering index is pretty big and we only seek in the leftmost column, which is also possible using another much smaller index on that single column.

In both cases, the operator is able to push the query predicate(s) to the seek and thanks to that, the number of rows coming out the operator is not that big. But the number of rows which match the leftmost key and therefore have to be read is quite big.

That’s why it’s so important to test queries rather than assuming you know how they will behave:  sometimes the normal answer isn’t quite so.

Comments closed

Indexing Foreign Keys

Kim Tripp looks at the practice of adding non-clustered indexes on columns which make up foreign key constraints:

I’m going to tie today’s post with a question I received recently: if I have a table that has 6 foreign key columns/references should I create one index with all 6 foreign key columns in it, or should I create 6 individual indexes – one for each foreign key reference.

This is an interesting question with a few tangents to cover for our SQL101 series. I consider indexing foreign keys as part of my “Phase 1 of index tuning.” But, let’s go back to some basics before we dive into the answer on this one.

Read on for Kim’s thoughts on the topic and some good practices.

Comments closed

Introduction To Probability

Mala Mahadevan covers some basics of probability:

Probability is an important statistical and mathematical concept to understand. In simple terms – probability refers to the chances of possible outcome of an event occurring within the domain of multiple outcomes. Probability is indicated by a whole number – with 0 meaning that the outcome has no chance of occurring and 1 meaning that the outcome is certain to happen. So it is mathematically represented as P(event) = (# of outcomes in event / total # of outcomes). In addition to understanding this simple thing, we will also look at a basic example of conditional probability and independent events.

It’s a good intro to a critical topic in statistics.  If I would add one thing to this, it would be to state that probability is always conditional upon something.  It’s fair to write something as P(Event) understanding that it’s a shortcut, but in reality, it’s always P(Event | Conditions), where Conditions is the set of assumptions we made in collecting this sample.

Comments closed

R On Athena

Gopal Wunnava shows how to run R scripts using Amazon Athena as a data source:

Next, you’ll practice interactively querying Athena from R for analytics and visualization. For this purpose, you’ll use GDELT, a publicly available dataset hosted on S3.

Create a table in Athena from R using the GDELT dataset. This step can also be performed from the AWS management console as illustrated in the blog post “Amazon Athena – Interactive SQL Queries for Data in Amazon S3.”

This is an interesting use case for Athena.

Comments closed