The summary top row suggests that all inserts to an empty clustered index will be minimally logged as long as
ORDERhints are specified. The
TABLOCKhint is required to enable the
RowSetBulkfacility as used for heap table bulk loads. An
ORDERhint is required to ensure rows arrive at the Clustered Index Insert plan operator in target index key order. Without this guarantee, SQL Server might add index rows that are not sorted correctly, which would not be good.
Unlike other bulk loading methods, it is not possible to specify the required
ORDERhint on an
INSERT...SELECTstatement. This hint is not the same as using an
ORDER BYclause on the
ORDER BYclause on an
INSERTonly guarantees the way any identity values are assigned, not row insert order.
Read on to see what you can do.
Most scripts (even ones I’ve worked on!) that look at the plan cache, have had the ability to sort it by different metrics: CPU, reads, writes, duration, etc.
A lot of people are very interested in long running queries — and I am too!
Heck, they’re how I make money. Blogging pays like crap.
But there’s a slight problem with only looking at query duration.
Read on for an example. Erik loses money on every blog post but he makes up for it in volume.
SQL Server 2016 introduced serial batch mode processing and aggregate pushdown. When pushdown is successful, aggregation is performed within the Columnstore Scan operator itself, possibly operating directly on compressed data, and taking advantage of SIMD CPU instructions.
The performance improvements possible with aggregate pushdown can be very substantial. The documentation lists some of the conditions required to achieve pushdown, but there are cases where the lack of ‘locally aggregated rows’ cannot be fully explained from those details alone.
This article covers additional factors that affect aggregate pushdown for
GROUP BYqueries only. Scalar aggregate pushdown (aggregation without a
GROUP BYclause), filter pushdown, and expression pushdown may be covered in a future post.
Read the whole thing.
Currently there is no way to change this 256MB in Power BI Desktop or Excel although someone has already posted a suggestion on the Ideas site to allow us to change it. How much of an impact does this actually have on refresh performance though? Without the ability to change this setting it’s hard to say, but I suspect it could be significant and that a lot of Power Query performance problems could be explained by this behaviour.
The situation is different in the Power BI service, where I understand there is a limit on the overall amount of memory that a single Power Query query evaluation can use.
Read on to understand the differences here between running on Power BI Desktop and running in the Power BI service, as well as a bit of testing on Chris’s part.
One thing I need you to understand first: you have to provide the database and the queries. Almost all of the tools in this post, except the last one, are designed to help you run queries, but they don’t include the queries. The whole idea with load testing is that you’re trying to mimic your own workloads. If you’re just trying to test a server with generic workloads, start with my post, “How to Check Performance on a New SQL Server.”
Click through for a list of tools. I’d also throw in Pigdog from Mark Wilkinson (one of my co-workers). This helped replicate a few issues in SQL Server 2017 around tempdb performance.
Our objective was to build a system that would provide an intuitive insight into Spark jobs that not just provides visibility but also codifies the best practices and deep experience we have gained after years of debugging and optimizing Spark jobs. The main design objectives were to be
– Intuitive and easy – Big data practitioners should be able to navigate and ramp quickly
– Concise and focused – Hide the complexity and scale but present all necessary information in a way that does not overwhelm the end user
– Batteries included – Provide actionable recommendations for a self service experience, especially for users who are less familiar with Spark
– Extensible – To enable additions of deep dives for the most common and difficult scenarios as we come across them
The tool looks pretty interesting and I’m hoping it will be part of the open source suite at Cloudera.
There’s a lot of excitement (alright, maybe I’m sort of in a bubble with these things) about SQL Server 2019 being able to inline most scalar UDFs.
But there’s a sort of weird catch with them. It’s documented, but still.
If you use GETDATE in the function, it can’t be inlined.
GETDATE() and its bretheren are non-deterministic so I figured that would be an issue. Check out the documentation for the other limitations.
One of the easiest things to fix when performance tuning queries are Key Lookups or RID Lookups. The key lookup operator occurs when the query optimizer performs an index seek against a specific table and that index does not have all of the columns needed to fulfill the result set. SQL Server is forced to go back to the clustered index using the Primary Key and retrieve the remaining columns it needs to satisfy the request. A RID lookup is the same operation but is performed on a table with no clustered index, otherwise known as a heap. It uses a row id instead of a primary key to do the lookup.
As you can see these can very expensive and can result in substantial performance hits in both I/O and CPU. Imagine a query that runs thousands of times per minute that includes one or more key lookups. This can result in tremendous overhead which is generated by these extra reads it effects the overall engine performance.
Monica’s absolutely right: key lookups can take a decent query and make it into a performance hog.
In my last post I mentioned the Power Query engine’s persistent cache, which in some scenarios caches the data read from a data source when a query is refreshed. Another important nugget of information that Ehren von Lehe of the Power Query dev team mentioned in a post on the Power Query MSDN forum recently is the fact that if you use File.Contents to get data from a file then the persistent cache is not used, but if you use Web.Contents to get data from the same file then the persistent cache is used. I guess the thinking here is that there is no point creating an on-disk cache containing the contents of a file that is already on disk.
Chris takes us through a couple of unexpected twists, so check it out.
In CTP 2.4 not all actual execution plans will be available, you can see more details on that here.
For an upcoming CTP version, all queries will be available with the equivalent of the actual execution plan. At least those where the plan was cached in the first place, or those where the plan has not been evicted from cache.
That caveat aside, I’m happy with this.