Press "Enter" to skip to content

Author: Kevin Feasel

Finding And Fixing The N+1 Problem With ORMs

Richie Rump explains the N+1 problem with object-relational mappers and shows you how to avoid it with Entity Framework:

The problem is that in our original query we’re not getting data from the LinkedPosts entity, just data from Posts and PostTags. Entity Framework knows that it doesn’t have the data for the LinkPosts entity, so it very kindly gets the data from the database for each row in the query results.

Whoops!

Obviously, making multiple calls to the database instead of one call for the same data is slower. This is a perfect example of RBAR (row by agonizing row) processing.

Read the comments for more answers on top of Richie’s.  My answer (only 70% tongue in cheek)?  Functional programming languages don’t require ORMs.

Comments closed

Watching Power BI Grow Up

Paul Turley argues that Power BI is getting to be a mature product:

In the opening keynote and again in his sessions, Christian demonstrated Power BI reports on the taxi driver activity database with over a trillion rows of raw data.  The larger dataset was in a Spark cluster, accessed using DirectQuery.  Aggregated tables were stored in the in-memory model using the new composite model feature.  As the data was explored in report visuals, the Power BI engine would seamlessly switch from tabular in-memory aggregate tables to DirectQuery source data in order to return low-level details.  Composite models will allow mashing-up imported database and file-based data with an DirectQuery.

There are limits and complexities with these new features.  You cannot mashup imported tables in a Power BI model based in a direct connection to SSAS, but enterprise-scale features in Power BI arguably may not steer a solution architect to select SSAS over Power BI for serious data modeling.  With incremental data refresh, large model support, row-level security and many other “big kid” features, Power BI might be a preferable choice.  I’m not ready to rule-out Analysis Services as the better option for most enterprise solutions – at least not in the near future, but Power BI is definitely heading in that direction.

Click through for several other features which help convince Paul that Power BI is threatening Analysis Services for enterprise data analysis solutions.

Comments closed

Changing Highlight Color In SQL Operations Studio

Kendra Little shows how you can change the code highlight color in SQL Operations Studio:

One of my major uses of SQL Ops studio will be demonstrating code in webcasts and videos, so it’s important to me to be able to set a high-contrast highlight for lines of code.

SQL Ops Studio is based on VSCode, which is very flexible, so I suspected there was a way to do this already. Also, there is a lot of documentation out there on VSCode already, so I searched on “VSCode change highlight color” to help find my way to the solution.

But I figured that lots of folks starting out with SQL Ops Studio may not know that, and that this post might be a good introduction to how to change things like this – as well as how to find things by searching for “vscode”!

Good info.

Comments closed

Auditing xp_cmdshell Usage

Patrick Keisler shows how to create a SQL Server audit file for xp_cmdshell usage:

This article assumes you already have a basic understanding of SQL Server Audit, but if not, use this link to catch up on all the details.

Are you required to have xp_cmdshell enabled on one of your servers? If so, then setup a SQL Audit now to track its use. Never mind the implications of enabling xp_cmdshell, as a DBA you are responsible for what happens on your servers and tracking the use of xp_cmdshell should be a priority.

Some smart people will tell you to disable xp_cmdshell altogether, but I don’t like that advice at all.  Auditing usage can give you more peace of mind while not limiting your ability to use a valuable tool.

Comments closed

Implementing K Nearest Neighbors In Python

Atul Harsha gives us a demo on k nearest neighbors in Python:

In order to make any predictions, you have to calculate the distance between the new point and the existing points, as you will be needing k closest points.

In this case for calculating the distance, we will use the Euclidean distance. This is defined as the square root of the sum of the squared differences between the two arrays of numbers

Specifically, we need only first 4 attributes(features) for distance calculation as the last attribute is a class label. So for one of the approach is to limit the Euclidean distance to a fixed length, thereby ignoring the final dimension.

Check it out.

Comments closed

The Basics Of RDDs In Apache Spark

Anmol Sarna walks us through some of the basics of Resilient Distributed Datasets in Apache Spark:

  • Resilient, i.e. fault-tolerant with the help of RDD lineage graph and so able to recompute missing or damaged partitions due to node failures.

  • Distributed with data residing on multiple nodes in a cluster.

  • Dataset is a collection of partitioned data.

Now we know what RDD stands for. Now let’s try to understand it.

It’s a nice intro to the topic.  And even though there are other data models which sit on top of RDDs to make life easier for developers, it’s still important to understand the core model in Spark.

Comments closed

What Those Power BI Processes In Task Manager Mean

Kellyn Pot’vin-Gorman explains what you see in the Task Manager when loooking at an instance of Power BI Desktop:

We can see that there are numerous threads, with a few taking considerable memory over others-  The CefSharp.BrowserSubprocess can be a bit misleading-  It’s Power BI using Chromium to render the visuals that are part of the Power BI Desktop that’s part of the current run.  Chromium (CefSharp.BrowserSubprocess) subprocesses will always come in pairs, one for rendering and one for messaging.

In the Task Manager Details, we can see each of the PIDs that correspond with the processes IDs listed in the logs.  By updating our viewable columns, (right click, choose “threads” and click OK) you can now view how many threads are associated with a given PID.

Read on for more.

Comments closed

Breaking A Database File Into Multiple Files

Lori Brown shows us how to take a database with one database file and add new database files to it:

I occasionally come across some pretty good sized databases that are set up with a single data file.  We recently have been working with a client to break up their single data file into multiple data files so that we can spread them over several different LUNs and so that they can take advantage of the improved performance of using the files in parallel.  The concept is much like setting up tempdb with 1 file (up to 8) per core.

Since most people don’t think about using multiple files for databases until they have grown large enough to be a problem, I think that most don’t realize that breaking up a database can be done at any time, you just need to have enough space for new files.  Here is a bit of a demo on how to do this.

Do read Lori’s warning at the end, however, should you decide to do this in production.

Comments closed

Understanding sp_reset_connection

Greg Low explains what sp_reset_connection does and why it’s often a good thing:

Anyone who’s ever traced activity against a SQL Server will have no doubt seen a large number of commands where the procedure sp_reset_connection has been executed. Yet, this command won’t appear anywhere in the source code of the applications that are running.

As an example of why this occurs, one of the most common data access technologies that is used to connect applications to SQL Server is ADO.NET. It has a SqlConnection object that represents a connection that can be opened to a SQL Server instance. In the design of the SqlConnection class, the architects of it were grappling with two big issues:

  • They knew that opening and closing connections to SQL Server was a relatively expensive process.
  • They also knew that on a busy website, they didn’t want to use enough resources (or might not even have had them), to open up a connection for each concurrent session on the website.

So they decided to make the connections to SQL Server able to be shared.

Read the whole thing.

Comments closed

When Using DBCC DROPCLEANBUFFERS

Dan Guzman shares words of wisdom with using DBCC DROPCLEANBUFFERS for testing query performance in SQL Server:

One can make the argument that DBCC DROPCLEANBUFFERS might not be particularly valuable for testing. First, the storage engine in SQL Server Enterprise Edition (or Developer Edition, which is often used when testing) behaves differently with a cold cache versus a warm one. With a warm cache, a page not already in cache (e.g. index seek by primary key) will be fetched from disk using a single 8K page IO request as one expects. However, when the cache isn’t fully warmed up (Buffer Manager’s Target Pages not yet met), the entire 64K extent (8 contiguous 8K pages) is read for the single page request regardless of whether the adjacent pages are actually needed by the query. This has the benefit of warming the cache much more quickly than would otherwise occur, but given that the normal steady state of a production SQL Server is a warm cache, testing with a cold cache isn’t a fair comparison of different plans. More data than normal will be transferred from storage so timings may not be indicative of actual performance.

I don’t think I agree 100% with that argument, but I am sympathetic to it.  Still, Dan has great advice in this post.

Comments closed