Press "Enter" to skip to content

Category: Performance Tuning

R: Avoid apply() with Large Datasets

George Pipis walks us through the performance cost of using the apply() function against a large dataset in R:

When we are dealing with large datasets and there is a need to calculate some values like the row/column min/max/rank/mean etc we should avoid the apply function because it takes a lot of time. Instead, we can use the matrixStats package and its corresponding functions. Let’s provide some comparisons.

Click through for a demonstration of how much slower it can be in certain cases. H/T R-Bloggers.

Comments closed

Optimizing Cross Join Performance in Power BI

Chris Webb tries an experiment:

In a late addition to the series of posts that I started here a few months ago, I thought it would be useful to look at how to optimise the performance of a rare (but nonetheless useful) type of merge: the cross join. Cross joins are not available as a join kind in the Table.Join or Table.NestedJoin M functions but there are two ways of creating them in Power Query:

1. Creating a custom column in your query that contains the query that you want to cross join with and then expanding, as Matt Allington describes here
2. Creating a custom column that contains a single value, such as the number 1, on both your queries and then using that column to merge the two queries on (something I mentioned a long time ago here)

Which one gives you the best performance? 

Chris has the answer to that question and the tests to prove it.

Comments closed

Troubleshooting SQL Server with PSSDIAG and SQL Nexus

Mat Hayward-Hill shows how to use PSSDIAG and SQL Nexus. First, PSSDIAG:

Once you have worked on enough SQL Server performance issues, you start to develop an approach and lean on the same tools, to collect and present your data. I will walk you through how I troubleshoot a SQL Server performance issue, having no prior knowledge of the system. Typically, the description of the problem goes along the lines of “it was fine last week, but today queries are taking forever or timing out.”

If the issue is with one query, I will start with the Query Plan. Otherwise, when the problem is system-wide, and the SQL instance sits on top of a Windows on-prem or IaaS server, I use PSSDIAG to gather the data.

Then, SQL Nexus:

This post is the second part of how I troubleshoot a SQL Server performance issue like a pro. In Part 1, we used PSSDIAG to collect performance data. Now, we will look at how to load and analyse that data using SQL Nexus.

Before we dive straight into SQL Nexus, there are a few prerequisites that you need to install. I will confess I didn’t get this right on my first attempt so to make it easier for you, I have included the requirement along with the link of where to find it.

Check out both posts.

Comments closed

Enabling SQL Server Optimizer Hotfixes

John Morehouse takes us through the step-by-step for enabling optimizer hotfixes in SQL Server:

There are a number of knobs and switches that are available to database administrators that can be used to enable better performance.  There are three options in particular that this blog will be discussing, trace flag 4199, the database scoped configuration QUERY_OPIMIZER_HOTFIXES and the qeury hint ENABLE_QUERY_OPTIMIZER_HOTFIXES. Understanding how these options function will give you a hand up on ensuring the query optimizer is running as optimally as possible.

Let’s take a look at the three options.

Read on to learn more. There is some potential risk of regression with new optimizer updates, so standard rules around testing apply.

Comments closed

All About The Compute Scalar Operator

Hugo Kornelis continues a quest to document query plan operators:

The Compute Scalar operator is used to compute new values based on other columns in the same row. These new values are then added as extra column(s) in the output rows.

The expressions used to compute the new values can only refer to constant values and to columns in the input rows of the Compute Scalar operator. Other than that, there are, to my knowledge, no restrictions. The expressions can vary from very simple to extremely complex. The expressions can even include references to scalar user-defined functions, to CLR user-defined functions, and to built-in CLR functions.

Read on for a good deal of information about the operator.

Comments closed

Performance Tuning for Cloudera’s Operational Database

Liliana Kadar, et al, show us the tools we can use to tune Cloudera’s Operatioanl Database:

A query optimizer determined the most efficient way to run a query. Query optimization helps you to reduce the hardware resources required to run a query and also speeds up your query-response time. Cloudera’s Operational Database provides you with various tools such as plan analyzers to make optimal use of your computing resources. 

Cloudera’s OpDB provides various cost-based and rules-based optimizers. You can use different optimizers based on your use cases. OpDB is primarily used for Online Transactional Processing (OLTP) use cases with Apache Phoenix in the OpDB used as a SQL engine. But you can also use Hive and Impala for Online Analytical Processing (OLAP) use cases. 

Read on for recommendations on platform choice as well as indexing and tuning options.

Comments closed

Parallel Processing in Analysis Services

Kasper de Jonge takes us through parallel processing of partitions and tables in Analysis Services:

One thing that has come up several times in the last few weeks is a misconception that you cannot process multiple partitions on the same table or tables in parallel as it would cause a lock. This could be true if you try to do the parallelism yourself. Like in SQL Server you need to think about transactions, the AS engine is a transactional system too.

So, the AS engine is definitely capable of loading data in parallel but only if you let him do the puzzling on concurrency. This means you must send processing commands to the AS engine in one transaction so the AS engine can manage the locks itself. There are other benefits of letting AS doing the work like recalculating the calculated items (tables, columns etc) once instead of multiple times which improves processing performance.

Read on for an example.

Comments closed

Keep Parameter Sniffing On

Brent Ozar explains why you should keep parameter sniffing on:

What they THINK is going to happen is that SQL Server will do an OPTION(RECOMPILE) on every incoming query, building fresh plans each time. That ain’t how this works at all, and instead, I wish this “feature”‘s name was “Parameter Blindfolding.” Here’s what it really does.

Read on for the explanation. In reality, parameter sniffing is almost always a good thing. It’s when you have major skews in data that you even have to think about parameter sniffing being a problem.

Comments closed

Tips for Optimizing Columnstore Indexes

Ed Pollack continues a series on columnstore indexes:

This is worth a second mention: Avoid updates at all costs! Columnstore indexes do not treat updates efficiently. Sometimes they will perform well, especially against smaller tables, but against a large columnstore index, updates can be extremely expensive.

If data must be updated, structure it as a single delete operation followed by a single insert operation. This will take far less time to execute, cause less contention, and consume far fewer system resources.

Read on for several more tips along these lines.

Comments closed