Performance Tuning – Page 40

The other day I was thinking about the blog post Faster temp table and table variable by using memory optimization I read a while back. Since you can’t believe anything on the internets (no disrespect to whoever wrote that post) , I decided to take this for a test
In this post I will be creating 2 databases, one is a plain vanilla database and the other, a database that also has a file group that contains memory optimized data
I will also be creating a table type in each database, a plain one and a memory optimized one in the memory optimized database

Read on for Denis’s findings.

Comments closed

Understanding Implicit Conversions

Published 2020-01-16 by Kevin Feasel

Jared Poche takes us through the concept of implicit conversions:

A quick search will tell you that implicit conversions are pretty awful for performance, and in particular drive CPU usage. That’s not news. There is an aspect of this I think a lot of engineers don’t understand; why does it cause performance issues?
An implicit conversion occurs when you try to compare between or assign data across two different data types, without converting one yourself. If you used a CAST or CONVERT, it would be an explicit conversion. When SQL Server does it for you, it’s an implicit conversion, and these can have a real impact on your execution plans. Not all combinations of types can be converted implicitly, for a full list look here.

What’s interesting in this is that there were some cases where Jared expected implicit conversion and the rules indicate that there should be implicit conversion, but the database optimizer saw through his ruse.

Comments closed

Reporting Queries and User-Defined Functions

Published 2020-01-16 by Kevin Feasel

Erik Darling tells a story of two things which do not get along very well:

The OLTP part was working fine, but there was a reporting element that was dog slow, and would cause all sorts of problems on the server.
When we got into things, I noticed something rather funny: All of their reporting queries had very high estimated costs, and all the plans were totally serial.
The problem came down to two functions that were used in the OLTP portion, which were reused in the reporting portion.

Read on to understand why SQL Server 2019 and its function inlining capability would not have helped here.

Comments closed

Optimal Kafka Partitioning

Published 2020-01-09 by Kevin Feasel

Paul Brebner is on a quest:

This blog provides an overview around the two fundamental concepts in Apache Kafka : Topics and Partitions. While developing and scaling our Anomalia Machina application we have discovered that distributed applications using Kafka and Cassandra clusters require careful tuning to achieve close to linear scalability, and critical variables included the number of Kafka topics and partitions. In this blog, we test that theory and answer questions like “What impact does increasing partitions have on throughput?” and “Is there an optimal number of partitions for a cluster to maximize write throughput?” And more!

Read on for some interesting findings.

Comments closed

Determining if a UDF Call Inlined

Published 2020-01-03 by Kevin Feasel

Taiob Ali helps us figure out when SQL Server inlines your scalar UDF:

Starting SSMS 18.2 a new attribute was added in QueryPlan. When the inline scalar UDF feature is enabled ‘ContainsInlineScalarTsqludfs’ value will be true. Let’s look at this in action. Run below tsql using the latest version of SSMS and after turning on the actual execution plan in SSMS.

Read on to see it in action.

Comments closed

Azure Synapse Analytics Result Set Caching

Published 2020-01-02 by Kevin Feasel

Niko Neugebauer takes us through result set caching in ~~Azure SQL Data Warehouse~~ Azure Synapse Analytics:

I just put some result on the output, because as you can imagine there are some certain limits on the amount of the output that will be cached and that will be not. Besides the basic logical stuff, such as having deterministic functions only (functions which output will not be varying depending on the execution), not using System Objects or UDFs (and it seems that scalar UDF inlining is not a part of Azure SQL DW yet), no row-level security or column-level security enabled, the main thing and which seems to be pretty good decision as far as I am concerned – the row size larger than 64KB won’t be cached period.

Read on to see what Niko has learned, including cache performing and limitations. Between this and the data pools in SQL Server Big Data Clusters, Microsoft’s spent some time thinking about data caching in cloud-based versions of SQL Server.

Comments closed

Repartitioning and Coalescing in Spark

Published 2019-12-31 by Kevin Feasel

Divyansh Jain contrasts repartitioning and coalescing in Spark:

What is Coalesce?
The coalesce method reduces the number of partitions in a DataFrame. Coalesce avoids full shuffle, instead of creating new partitions, it shuffles the data using Hash Partitioner (Default), and adjusts into existing partitions, this means it can only decrease the number of partitions.
What is Repartitioning?
The repartition method can be used to either increase or decrease the number of partitions in a DataFrame. Repartition is a full Shuffle operation, whole data is taken out from existing partitions and equally distributed into newly formed partitions.

Read on to learn good reasons to use both.

Comments closed

Performance Tuning Load of Partitioned Hive Tables on S3 with Spark

Published 2019-12-31 by Kevin Feasel

Dmitry Tolpeko walks us through a performance problem in Spark:

I have a Spark job that transforms incoming data from compressed text files into Parquet format and loads them into a daily partition of a Hive table. This is a typical job in a data lake, it is quite simple but in my case it was very slow.
Initially it took about 4 hours to convert ~2,100 input .gz files (~1.9 TB of data) into Parquet, while the actual Spark job took just 38 minutes to run and the remaining time was spent on loading data into a Hive partition.
Let’s see what is the reason of such behavior and how we can improve the performance.

Read on to see how.

Comments closed

Costs and Benefits of Recompilation

Published 2019-12-27 by Kevin Feasel

Erik Darling takes us through the pros and cons of slapping that RECOMPILE label on a query:

It’s been a while since SQL Server has had a real RECOMPILE problem. And if you put it up against the performance problems that you can hit with parameter sniffing, I’d have a hard time telling someone strapped for time and knowledge that it’s the worst idea for them.
Obviously, you can run into problems if you (“you” includes Entity Framework, AKA the Database Demolisher) author the kind of queries that take a very long time to compile. But as I list them out, I’m kinda shrugging.

My rule of thumb is that WITH RECOMPILE isn’t the first answer, but I won’t mess around too long before going to it.

Comments closed

The Benefits of DAX Variables

Published 2019-12-17 by Kevin Feasel

Reza Rad explains why you should use DAX variables if you’re repeating calculations:

We have to main parts in the expression above: A and B. Each of those is doing a calculation. Now, with the markings above, reading the expression is much simpler. The whole expression means this:
=IF(A>B, A, B)
All the above expression is saying is that if A is bigger than B, then return A, otherwise B. Now it is much simpler to read it because we split the repetitive parts into sections. That is what exactly the DAX variable is for.

Readability is not the only benefit, however. Reza has more.

Comments closed

M	T	W	T	F	S	S
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30
31

Category: Performance Tuning

Contrasting TVPs and Memory-Optimized TVPs

Understanding Implicit Conversions

Reporting Queries and User-Defined Functions

Optimal Kafka Partitioning

Determining if a UDF Call Inlined

Azure Synapse Analytics Result Set Caching

Repartitioning and Coalescing in Spark

Performance Tuning Load of Partitioned Hive Tables on S3 with Spark

Costs and Benefits of Recompilation

The Benefits of DAX Variables