I’ve highlighted the interesting bit. “Actual Number of Locally Aggregated Rows” is part of aggregation push down, explained by the amazing Niko Negebauer here and here. Basically, the aggregation is occurring with the data access. So while there is a Hash Match operator for the aggregation, actually, the active part of the aggregation was performed within the columnstore. That’s why the Actual Number of Rows coming out of the columnstore index itself is 0, but the number of rows coming out of the Hash Match Aggregate is 441.
So… why not another aggregate push down when we used the hint? Because the hint says, we MUST use a hash join. At that point the optimizer has no choices on where, when, how it does data processing. It must, first, ensure that a hash join is used, so it does. First thing out of the gate, hash join. Then a hash aggregate. This difference in behavior results in a 24% decrease in performance. The only interesting thing is that the reads remained consistent. This means that it was just the processing of the join that added overhead.
Read the whole thing.
The scripts I ran to edit the rest of the databases looked similar to the below:
12 ALTER DATABASE msdb MODIFY FILE ( NAME = ‘MSDBDat’ , FILENAME = ‘M:\MSSQL\Data\MSDBDat.mdf’ );ALTER DATABASE msdb MODIFY FILE ( NAME = ‘MSDBLog’ , FILENAME = ‘L:\MSSQL\Logs\MSDBLog.ldf’ );
Once I finished altering all of my database files to their new locations, I stopped the SQL Server Service in Services. I copied and pasted all MDF and LDF files to their correlated new destinations and then started the SQL Server Service once more.
That’s when I ran into the interesting issue of “Recovery in a Pending state”. Some digging and sleuthing brought me back to my scripts.
Read on for those causes.
I had a situation last week where I needed to find the MIN() and MAX() values of some data. Normally this would be pretty straightforward, but the problem was I needed to identify the MIN() and MAX() values amongst multiple columns. So today I bring you a SQL tip using APPLY.
Looking at sys.procedures we can see I have a create_date and a modify_date column for each record. What if we needed to return the MIN() or MAX() value from those 2 (or more) columns?SELECT [Name] ,create_date ,modify_date FROM sys.procedures AS p WHERE p.[name] = 'ChangePassword';
In this two-column example, it’s not too difficult. As you add more and more columns, the solution remains the same, though the urge to ask why all of these dates are unpivoted might increase…
One of the missing language features in T-SQL language is array support. In some cases you can use custom types to work with arrays that are passed as parameters to your stored procedures.
Custom types in T-SQL enable you to create alias for some table, .Net or built-in type. Once you create a custom type, you can use it for local variables and parameters of functions.
I go back and forth on whether I’d like full array support in T-SQL, as on the plus side, it simplifies interactions with external tools. On the other hand, it can promote bad habits like violating first normal form.
For anyone working with SQL Server since version 2005 (with Service Pack 2 to be precise) there is a very common task when thinking or research the system optimisation – the usage of the compression, and before advancing with this step the question that one usually receives is – “can you estimate how much improvement we shall get?”
For this purpose since SQL Server 2008, we have a very useful stored procedure that is called sp_estimate_data_compression_savings, that is capable of providing us with the estimation of how much storage we can save by enabling or moving to a more effective compression method (as in NONE | ROW | PAGE types).
Columnstore Indexes have appeared in SQL Server 2012 (that is well over 6 years ago) and even though from time to time I would ask and suggest to enable the stored procedure sp_estimate_data_compression_savings to start supporting Columnstore Indexes, until now there is no such support.
Until now – I am introducing a conjunction of my scripts in a comprehensive and reasonably capable stored procedure that is called “cstore_sp_estimate_columnstore_compression_savings” and that is a part of my free & open-sourced Columnstore Indexes Script Library, freely available on GitHub.
That’s a useful addition, especially when you’re trying to sell management on using clustered columnstore indexes.
In Qubole’s 2018 Data Activation Report, we did a deep-dive analysis of how companies are adopting and using different big data engines. As part of this research, we found some fascinating details about Hadoop that we will detail in the rest of this blog.
A common misconception in the market is that Hadoop is dying. However, when you hear people refer to this, they often mean “MapReduce” as a standalone resource manager and “HDFS” as being the primary storage component that is dying. Beyond this, Hadoop as a framework is a core base for the entire big data ecosystem (Apache Airflow, Apache Oozie, Apache Hbase, Apache Spark, Apache Storm, Apache Flink, Apache Pig, Apache Hive, Apache NiFi, Apache Kafka, Apache Sqoop…the list goes on).
I clipped this portion rather than the direct analysis because I think it’s an important point: the Hadoop ecosystem is thriving as the matter of primary importance switches from what was important a decade ago (batch processing of large amounts of data on servers with direct attached storage) to what is important today (a combination of batch and streaming processing of large amounts of data on virtualized and often cloud-based servers with network-attached flash storage).
Now, it’s been pointed out, including by Adam Machanic (@adammachanic) in a tweet referencing Aaron’s post about GROUP BY v DISTINCT that the two queries are essentially different, that one is actually asking for the set of distinct combinations on the results of the sub-query, rather than running the sub-query across the distinct values that are passed in. It’s what we see in the plan, and is the reason why the performance is so different.
The thing is that we would all assume that the results are going to be identical.
But that’s an assumption, and isn’t a good one.
Rob starts out with
READ UNCOMMITTED but then gets into the “normal”
READ COMMITTED transaction isolation level that most places use.
When you create a “logical” Azure SQL Server (I say logical because we are not really physically creating anything) there is a setting that is ticked ON by default which is called “Allow Azure services to access server”.
The question is, what does it mean? (See the highlighted section below)
Read on to see what this does and why Arun doesn’t like the default.
I’ve been playing around with SQL Server running in Kubernetes in Azure Container Services (AKS)for a while now and I think that the technology is really cool.
You can get a highly available instance of SQL Server up and running with a few lines of code! Ok, there’s a bit of setup to do but once you get that out of the way, you’re good to go.
One thing that has been missing though, is persistent storage. Any changes made to the SQL instance would have been lost if the pod that it was running in failed and was brought back up.
Click through to learn how. It’s certainly not trivial, but Andrew does a good job showing us the step-by-step.
In this module you will learn how to use the TreeViz Custom Visual. The TreeViz is a breakdown tree that allows you to expand or collapse levels of hierarchical data.
Click through for the video, showing more. For limited, hierarchical, categorical data, this could work pretty well.