Press "Enter" to skip to content

Category: Performance Tuning

Checking SSIS Package Performance

Andy Brownsword digs into SSISDB:

I’ve recently been reviewing SSIS packages to make some performance fixes and needed a way to validate the results of those changes. I thought I’d share the scripts as they may be useful for others.

Rather than relying on run times from the SQL Agent running the packages I wanted to dive deeper into the packages so took another approach.

Read on for a query that gets the information. You can also eyeball it in the SSIS reports if you have a small or consistent set of packages to run.

Leave a Comment

Query Store Size-Based Cleanup Performance Issues

Kendra Little has a public service announcement:

I’m a huge fan of SQL Server’s Query Store feature. Query Store collects query execution plans and aggregate query performance metrics, including wait stats. Having Query Store enabled makes troubleshooting performance issues such as bad parameter sniffing, much, much easier. Because Query Store is integrated into SQL Server itself, it also can catch query plans in a lightweight way that an external monitoring system will often miss.

When performance matters, it’s important to ensure that you’re managing Query Store so that Query Store cleanup does not run during high volume times. Query Store cleanup could slow your workload down significantly.

Read on for more information. I’d also like to plug qdstoolbox, an open-source solution some of my former colleagues worked on. This includes QDSCacheCleanup, which works considerably better than the built-in cleanup process.

Leave a Comment

Speeding up Databricks Lakehouse Queries with Redis

Drew Furgiuele has the need for speed:

Since compute and storage are now separated, this means that any time you want to work with your data, you need some form of compute engine that is capable of connecting to and reading your data from your storage locations. Compute engines vary, but one of the best is Apache Spark, which gives you a great distributed compute layer suitable for all sorts of workloads, whether they be analytical and ad-hoc queries, dashboard or BI workloads, data engineering related, or even data science or AI/ML use cases. It really can do it all, and it does it very well.

But what about use operational use cases? For instance: let’s say your Lakehouse is hosting some data that is critical to customer-facing systems that demand low-latency response times, such as real-time users lookups, API interfaces, or event-driven systems, sometimes the overhead required to take a query, schedule it, and run it can be in the hundreds of milliseconds. For some workloads, that’s a lifetime.

Read on to see how you can build a caching layer on top of certain lakehouse operations when some operation needs to be as fast as possible.

Leave a Comment

SQL Server Index Included Columns and Log Bloat

Forrest McDaniel explains that TANSTAAFL:

Let me share the conclusion now: included columns in an index always get included in the log record, even if they weren’t modified. The same is not true for the clustered index. This can really matter if log generation is important to your environment, e.g. using an AG.

Click through for the proof. We often think of included columns on indexes as being nearly free performance gains, but “nearly” is carrying a bit of water here. This shouldn’t dissuade you from using included columns on indexes, but is a valid reason not to include all 50 non-index columns in a table “just in case.”

Leave a Comment

Comparing pgvector and Postgres ARRAY

Ernst-Georg Schmid makes a comp based on a mass spectrometry database:

As said in the introduction, mass spectrometry is one, if not the tool to identify unknown compounds, to quantify known compounds, and to determine the structure of molecules. But it is a lot of work, and you need reference spectra to compare against.

So, there are curated databases of validated spectra available, like MassBank JapanMassBank Europe and the NIST mass spectral libraries. Laboratories might also want to store their own libraries for future use.

However, such databases often come in their own formats and with their own retrieval software. If you need to efficiently connect spectra to other data, e.g. chemical structures or genomic data, this calls for central management and a common API.

Read on to see the comparison of the pgvector extension versus built-in functionality with ARRAY.

Comments closed

Incremental Refresh on Large Power BI Semantic Models

Soheil Bakhshi needs to refresh a lot of data:

Implementing incremental refresh on Power BI is usually straightforward if we carefully follow the implementation steps. However in some real-world scenarios, following the implementation steps is not enough. In different parts of my latest book, Expert Data Modeling with Power BI, 2’nd Edition, I emphasis the fact that understanding business requirements is the key to every single development project and data modelling is no different. Let me explain it more in the context of incremental data refresh implementation.

Read on for that explanation, as well as a few tips to make things work a bit more smoothly.

Comments closed

Relationship Columns and Power BI DirectQuery Mode

Chris Webb builds a relationship:

Many Power BI connectors for relational databases, such as the SQL Server connector, have an advanced option to control whether relationship columns are returned or not. By default this option is on. Returning these relationship columns adds a small overhead to the time taken to open a connection to a data source and so, for Power BI DirectQuery semantic models, turning this option off can improve report performance slightly.

Read on to learn what these relationship columns are and how you can remove them. Chris also provides a first-order approach to how you can estimate the performance pain involved with including these.

Comments closed

An Overview of Data Partitioning Strategies

thanhdoancong (there are spaces in there somewhere but I’d probably guess wrong) talks partitions:

Data partitioning is the magic wand that divides your massive dataset into smaller, organized subsets called partitions. These partitions are based on specific criteria, like date ranges, customer segments, or product categories.

It’s like organizing your overflowing closet by color, season, or type of clothing. Each section becomes easier to browse and manage, making life (and data analysis) much easier.

Read on for a few varieties of partitioning and how they could improve your data estate. There’s no guarantee that partitioning will definitely improve performance—and in SQL Server’s case, the partitioning feature often does not improve performance at all because that isn’t its intent—but this is a good read to get an idea of what strategies are available.

Comments closed

Measuring Query Times in Power BI DirectQuery Mode

Chris Webb breaks out the stopwatch:

If you’re tuning a DirectQuery semantic model in Power BI one of the most important things you need to measure is the total amount of time spent querying your data source(s). Now that the queries Power BI generates to get data from your source can be run in parallel it means you can’t just sum up the durations of the individual queries sent to get the end-to-end duration. The good news is that there are new traces event available in Log Analytics (though not in Profiler at the time of writing) which solves this problem.

Read on to learn more about this event.

Comments closed